3559 lines
138 KiB
Plaintext
3559 lines
138 KiB
Plaintext
@node Low-Level I/O, File System Interface, I/O on Streams, Top
|
|
@c %MENU% Low-level, less portable I/O
|
|
@chapter Low-Level Input/Output
|
|
|
|
This chapter describes functions for performing low-level input/output
|
|
operations on file descriptors. These functions include the primitives
|
|
for the higher-level I/O functions described in @ref{I/O on Streams}, as
|
|
well as functions for performing low-level control operations for which
|
|
there are no equivalents on streams.
|
|
|
|
Stream-level I/O is more flexible and usually more convenient;
|
|
therefore, programmers generally use the descriptor-level functions only
|
|
when necessary. These are some of the usual reasons:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
For reading binary files in large chunks.
|
|
|
|
@item
|
|
For reading an entire file into core before parsing it.
|
|
|
|
@item
|
|
To perform operations other than data transfer, which can only be done
|
|
with a descriptor. (You can use @code{fileno} to get the descriptor
|
|
corresponding to a stream.)
|
|
|
|
@item
|
|
To pass descriptors to a child process. (The child can create its own
|
|
stream to use a descriptor that it inherits, but cannot inherit a stream
|
|
directly.)
|
|
@end itemize
|
|
|
|
@menu
|
|
* Opening and Closing Files:: How to open and close file
|
|
descriptors.
|
|
* I/O Primitives:: Reading and writing data.
|
|
* File Position Primitive:: Setting a descriptor's file
|
|
position.
|
|
* Descriptors and Streams:: Converting descriptor to stream
|
|
or vice-versa.
|
|
* Stream/Descriptor Precautions:: Precautions needed if you use both
|
|
descriptors and streams.
|
|
* Scatter-Gather:: Fast I/O to discontinuous buffers.
|
|
* Memory-mapped I/O:: Using files like memory.
|
|
* Waiting for I/O:: How to check for input or output
|
|
on multiple file descriptors.
|
|
* Synchronizing I/O:: Making sure all I/O actions completed.
|
|
* Asynchronous I/O:: Perform I/O in parallel.
|
|
* Control Operations:: Various other operations on file
|
|
descriptors.
|
|
* Duplicating Descriptors:: Fcntl commands for duplicating
|
|
file descriptors.
|
|
* Descriptor Flags:: Fcntl commands for manipulating
|
|
flags associated with file
|
|
descriptors.
|
|
* File Status Flags:: Fcntl commands for manipulating
|
|
flags associated with open files.
|
|
* File Locks:: Fcntl commands for implementing
|
|
file locking.
|
|
* Interrupt Input:: Getting an asynchronous signal when
|
|
input arrives.
|
|
* IOCTLs:: Generic I/O Control operations.
|
|
@end menu
|
|
|
|
|
|
@node Opening and Closing Files
|
|
@section Opening and Closing Files
|
|
|
|
@cindex opening a file descriptor
|
|
@cindex closing a file descriptor
|
|
This section describes the primitives for opening and closing files
|
|
using file descriptors. The @code{open} and @code{creat} functions are
|
|
declared in the header file @file{fcntl.h}, while @code{close} is
|
|
declared in @file{unistd.h}.
|
|
@pindex unistd.h
|
|
@pindex fcntl.h
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypefun int open (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}])
|
|
The @code{open} function creates and returns a new file descriptor
|
|
for the file named by @var{filename}. Initially, the file position
|
|
indicator for the file is at the beginning of the file. The argument
|
|
@var{mode} is used only when a file is created, but it doesn't hurt
|
|
to supply the argument in any case.
|
|
|
|
The @var{flags} argument controls how the file is to be opened. This is
|
|
a bit mask; you create the value by the bitwise OR of the appropriate
|
|
parameters (using the @samp{|} operator in C).
|
|
@xref{File Status Flags}, for the parameters available.
|
|
|
|
The normal return value from @code{open} is a non-negative integer file
|
|
descriptor. In the case of an error, a value of @math{-1} is returned
|
|
instead. In addition to the usual file name errors (@pxref{File
|
|
Name Errors}), the following @code{errno} error conditions are defined
|
|
for this function:
|
|
|
|
@table @code
|
|
@item EACCES
|
|
The file exists but is not readable/writeable as requested by the @var{flags}
|
|
argument, the file does not exist and the directory is unwriteable so
|
|
it cannot be created.
|
|
|
|
@item EEXIST
|
|
Both @code{O_CREAT} and @code{O_EXCL} are set, and the named file already
|
|
exists.
|
|
|
|
@item EINTR
|
|
The @code{open} operation was interrupted by a signal.
|
|
@xref{Interrupted Primitives}.
|
|
|
|
@item EISDIR
|
|
The @var{flags} argument specified write access, and the file is a directory.
|
|
|
|
@item EMFILE
|
|
The process has too many files open.
|
|
The maximum number of file descriptors is controlled by the
|
|
@code{RLIMIT_NOFILE} resource limit; @pxref{Limits on Resources}.
|
|
|
|
@item ENFILE
|
|
The entire system, or perhaps the file system which contains the
|
|
directory, cannot support any additional open files at the moment.
|
|
(This problem cannot happen on the GNU system.)
|
|
|
|
@item ENOENT
|
|
The named file does not exist, and @code{O_CREAT} is not specified.
|
|
|
|
@item ENOSPC
|
|
The directory or file system that would contain the new file cannot be
|
|
extended, because there is no disk space left.
|
|
|
|
@item ENXIO
|
|
@code{O_NONBLOCK} and @code{O_WRONLY} are both set in the @var{flags}
|
|
argument, the file named by @var{filename} is a FIFO (@pxref{Pipes and
|
|
FIFOs}), and no process has the file open for reading.
|
|
|
|
@item EROFS
|
|
The file resides on a read-only file system and any of @w{@code{O_WRONLY}},
|
|
@code{O_RDWR}, and @code{O_TRUNC} are set in the @var{flags} argument,
|
|
or @code{O_CREAT} is set and the file does not already exist.
|
|
@end table
|
|
|
|
@c !!! umask
|
|
|
|
If on a 32 bit machine the sources are translated with
|
|
@code{_FILE_OFFSET_BITS == 64} the function @code{open} returns a file
|
|
descriptor opened in the large file mode which enables the file handling
|
|
functions to use files up to @math{2^63} bytes in size and offset from
|
|
@math{-2^63} to @math{2^63}. This happens transparently for the user
|
|
since all of the lowlevel file handling functions are equally replaced.
|
|
|
|
This function is a cancellation point in multi-threaded programs. This
|
|
is a problem if the thread allocates some resources (like memory, file
|
|
descriptors, semaphores or whatever) at the time @code{open} is
|
|
called. If the thread gets cancelled these resources stay allocated
|
|
until the program ends. To avoid this calls to @code{open} should be
|
|
protected using cancellation handlers.
|
|
@c ref pthread_cleanup_push / pthread_cleanup_pop
|
|
|
|
The @code{open} function is the underlying primitive for the @code{fopen}
|
|
and @code{freopen} functions, that create streams.
|
|
@end deftypefun
|
|
|
|
@comment fcntl.h
|
|
@comment Unix98
|
|
@deftypefun int open64 (const char *@var{filename}, int @var{flags}[, mode_t @var{mode}])
|
|
This function is similar to @code{open}. It returns a file descriptor
|
|
which can be used to access the file named by @var{filename}. The only
|
|
difference is that on 32 bit systems the file is opened in the
|
|
large file mode. I.e., file length and file offsets can exceed 31 bits.
|
|
|
|
When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is actually available under the name @code{open}. I.e., the
|
|
new, extended API using 64 bit file sizes and offsets transparently
|
|
replaces the old API.
|
|
@end deftypefun
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypefn {Obsolete function} int creat (const char *@var{filename}, mode_t @var{mode})
|
|
This function is obsolete. The call:
|
|
|
|
@smallexample
|
|
creat (@var{filename}, @var{mode})
|
|
@end smallexample
|
|
|
|
@noindent
|
|
is equivalent to:
|
|
|
|
@smallexample
|
|
open (@var{filename}, O_WRONLY | O_CREAT | O_TRUNC, @var{mode})
|
|
@end smallexample
|
|
|
|
If on a 32 bit machine the sources are translated with
|
|
@code{_FILE_OFFSET_BITS == 64} the function @code{creat} returns a file
|
|
descriptor opened in the large file mode which enables the file handling
|
|
functions to use files up to @math{2^63} in size and offset from
|
|
@math{-2^63} to @math{2^63}. This happens transparently for the user
|
|
since all of the lowlevel file handling functions are equally replaced.
|
|
@end deftypefn
|
|
|
|
@comment fcntl.h
|
|
@comment Unix98
|
|
@deftypefn {Obsolete function} int creat64 (const char *@var{filename}, mode_t @var{mode})
|
|
This function is similar to @code{creat}. It returns a file descriptor
|
|
which can be used to access the file named by @var{filename}. The only
|
|
the difference is that on 32 bit systems the file is opened in the
|
|
large file mode. I.e., file length and file offsets can exceed 31 bits.
|
|
|
|
To use this file descriptor one must not use the normal operations but
|
|
instead the counterparts named @code{*64}, e.g., @code{read64}.
|
|
|
|
When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is actually available under the name @code{open}. I.e., the
|
|
new, extended API using 64 bit file sizes and offsets transparently
|
|
replaces the old API.
|
|
@end deftypefn
|
|
|
|
@comment unistd.h
|
|
@comment POSIX.1
|
|
@deftypefun int close (int @var{filedes})
|
|
The function @code{close} closes the file descriptor @var{filedes}.
|
|
Closing a file has the following consequences:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
The file descriptor is deallocated.
|
|
|
|
@item
|
|
Any record locks owned by the process on the file are unlocked.
|
|
|
|
@item
|
|
When all file descriptors associated with a pipe or FIFO have been closed,
|
|
any unread data is discarded.
|
|
@end itemize
|
|
|
|
This function is a cancellation point in multi-threaded programs. This
|
|
is a problem if the thread allocates some resources (like memory, file
|
|
descriptors, semaphores or whatever) at the time @code{close} is
|
|
called. If the thread gets cancelled these resources stay allocated
|
|
until the program ends. To avoid this, calls to @code{close} should be
|
|
protected using cancellation handlers.
|
|
@c ref pthread_cleanup_push / pthread_cleanup_pop
|
|
|
|
The normal return value from @code{close} is @math{0}; a value of @math{-1}
|
|
is returned in case of failure. The following @code{errno} error
|
|
conditions are defined for this function:
|
|
|
|
@table @code
|
|
@item EBADF
|
|
The @var{filedes} argument is not a valid file descriptor.
|
|
|
|
@item EINTR
|
|
The @code{close} call was interrupted by a signal.
|
|
@xref{Interrupted Primitives}.
|
|
Here is an example of how to handle @code{EINTR} properly:
|
|
|
|
@smallexample
|
|
TEMP_FAILURE_RETRY (close (desc));
|
|
@end smallexample
|
|
|
|
@item ENOSPC
|
|
@itemx EIO
|
|
@itemx EDQUOT
|
|
When the file is accessed by NFS, these errors from @code{write} can sometimes
|
|
not be detected until @code{close}. @xref{I/O Primitives}, for details
|
|
on their meaning.
|
|
@end table
|
|
|
|
Please note that there is @emph{no} separate @code{close64} function.
|
|
This is not necessary since this function does not determine nor depend
|
|
on the mode of the file. The kernel which performs the @code{close}
|
|
operation knows which mode the descriptor is used for and can handle
|
|
this situation.
|
|
@end deftypefun
|
|
|
|
To close a stream, call @code{fclose} (@pxref{Closing Streams}) instead
|
|
of trying to close its underlying file descriptor with @code{close}.
|
|
This flushes any buffered output and updates the stream object to
|
|
indicate that it is closed.
|
|
|
|
@node I/O Primitives
|
|
@section Input and Output Primitives
|
|
|
|
This section describes the functions for performing primitive input and
|
|
output operations on file descriptors: @code{read}, @code{write}, and
|
|
@code{lseek}. These functions are declared in the header file
|
|
@file{unistd.h}.
|
|
@pindex unistd.h
|
|
|
|
@comment unistd.h
|
|
@comment POSIX.1
|
|
@deftp {Data Type} ssize_t
|
|
This data type is used to represent the sizes of blocks that can be
|
|
read or written in a single operation. It is similar to @code{size_t},
|
|
but must be a signed type.
|
|
@end deftp
|
|
|
|
@cindex reading from a file descriptor
|
|
@comment unistd.h
|
|
@comment POSIX.1
|
|
@deftypefun ssize_t read (int @var{filedes}, void *@var{buffer}, size_t @var{size})
|
|
The @code{read} function reads up to @var{size} bytes from the file
|
|
with descriptor @var{filedes}, storing the results in the @var{buffer}.
|
|
(This is not necessarily a character string, and no terminating null
|
|
character is added.)
|
|
|
|
@cindex end-of-file, on a file descriptor
|
|
The return value is the number of bytes actually read. This might be
|
|
less than @var{size}; for example, if there aren't that many bytes left
|
|
in the file or if there aren't that many bytes immediately available.
|
|
The exact behavior depends on what kind of file it is. Note that
|
|
reading less than @var{size} bytes is not an error.
|
|
|
|
A value of zero indicates end-of-file (except if the value of the
|
|
@var{size} argument is also zero). This is not considered an error.
|
|
If you keep calling @code{read} while at end-of-file, it will keep
|
|
returning zero and doing nothing else.
|
|
|
|
If @code{read} returns at least one character, there is no way you can
|
|
tell whether end-of-file was reached. But if you did reach the end, the
|
|
next read will return zero.
|
|
|
|
In case of an error, @code{read} returns @math{-1}. The following
|
|
@code{errno} error conditions are defined for this function:
|
|
|
|
@table @code
|
|
@item EAGAIN
|
|
Normally, when no input is immediately available, @code{read} waits for
|
|
some input. But if the @code{O_NONBLOCK} flag is set for the file
|
|
(@pxref{File Status Flags}), @code{read} returns immediately without
|
|
reading any data, and reports this error.
|
|
|
|
@strong{Compatibility Note:} Most versions of BSD Unix use a different
|
|
error code for this: @code{EWOULDBLOCK}. In the GNU library,
|
|
@code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter
|
|
which name you use.
|
|
|
|
On some systems, reading a large amount of data from a character special
|
|
file can also fail with @code{EAGAIN} if the kernel cannot find enough
|
|
physical memory to lock down the user's pages. This is limited to
|
|
devices that transfer with direct memory access into the user's memory,
|
|
which means it does not include terminals, since they always use
|
|
separate buffers inside the kernel. This problem never happens in the
|
|
GNU system.
|
|
|
|
Any condition that could result in @code{EAGAIN} can instead result in a
|
|
successful @code{read} which returns fewer bytes than requested.
|
|
Calling @code{read} again immediately would result in @code{EAGAIN}.
|
|
|
|
@item EBADF
|
|
The @var{filedes} argument is not a valid file descriptor,
|
|
or is not open for reading.
|
|
|
|
@item EINTR
|
|
@code{read} was interrupted by a signal while it was waiting for input.
|
|
@xref{Interrupted Primitives}. A signal will not necessary cause
|
|
@code{read} to return @code{EINTR}; it may instead result in a
|
|
successful @code{read} which returns fewer bytes than requested.
|
|
|
|
@item EIO
|
|
For many devices, and for disk files, this error code indicates
|
|
a hardware error.
|
|
|
|
@code{EIO} also occurs when a background process tries to read from the
|
|
controlling terminal, and the normal action of stopping the process by
|
|
sending it a @code{SIGTTIN} signal isn't working. This might happen if
|
|
the signal is being blocked or ignored, or because the process group is
|
|
orphaned. @xref{Job Control}, for more information about job control,
|
|
and @ref{Signal Handling}, for information about signals.
|
|
@end table
|
|
|
|
Please note that there is no function named @code{read64}. This is not
|
|
necessary since this function does not directly modify or handle the
|
|
possibly wide file offset. Since the kernel handles this state
|
|
internally, the @code{read} function can be used for all cases.
|
|
|
|
This function is a cancellation point in multi-threaded programs. This
|
|
is a problem if the thread allocates some resources (like memory, file
|
|
descriptors, semaphores or whatever) at the time @code{read} is
|
|
called. If the thread gets cancelled these resources stay allocated
|
|
until the program ends. To avoid this, calls to @code{read} should be
|
|
protected using cancellation handlers.
|
|
@c ref pthread_cleanup_push / pthread_cleanup_pop
|
|
|
|
The @code{read} function is the underlying primitive for all of the
|
|
functions that read from streams, such as @code{fgetc}.
|
|
@end deftypefun
|
|
|
|
@comment unistd.h
|
|
@comment Unix98
|
|
@deftypefun ssize_t pread (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off_t @var{offset})
|
|
The @code{pread} function is similar to the @code{read} function. The
|
|
first three arguments are identical, and the return values and error
|
|
codes also correspond.
|
|
|
|
The difference is the fourth argument and its handling. The data block
|
|
is not read from the current position of the file descriptor
|
|
@code{filedes}. Instead the data is read from the file starting at
|
|
position @var{offset}. The position of the file descriptor itself is
|
|
not affected by the operation. The value is the same as before the call.
|
|
|
|
When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the
|
|
@code{pread} function is in fact @code{pread64} and the type
|
|
@code{off_t} has 64 bits, which makes it possible to handle files up to
|
|
@math{2^63} bytes in length.
|
|
|
|
The return value of @code{pread} describes the number of bytes read.
|
|
In the error case it returns @math{-1} like @code{read} does and the
|
|
error codes are also the same, with these additions:
|
|
|
|
@table @code
|
|
@item EINVAL
|
|
The value given for @var{offset} is negative and therefore illegal.
|
|
|
|
@item ESPIPE
|
|
The file descriptor @var{filedes} is associate with a pipe or a FIFO and
|
|
this device does not allow positioning of the file pointer.
|
|
@end table
|
|
|
|
The function is an extension defined in the Unix Single Specification
|
|
version 2.
|
|
@end deftypefun
|
|
|
|
@comment unistd.h
|
|
@comment Unix98
|
|
@deftypefun ssize_t pread64 (int @var{filedes}, void *@var{buffer}, size_t @var{size}, off64_t @var{offset})
|
|
This function is similar to the @code{pread} function. The difference
|
|
is that the @var{offset} parameter is of type @code{off64_t} instead of
|
|
@code{off_t} which makes it possible on 32 bit machines to address
|
|
files larger than @math{2^31} bytes and up to @math{2^63} bytes. The
|
|
file descriptor @code{filedes} must be opened using @code{open64} since
|
|
otherwise the large offsets possible with @code{off64_t} will lead to
|
|
errors with a descriptor in small file mode.
|
|
|
|
When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a
|
|
32 bit machine this function is actually available under the name
|
|
@code{pread} and so transparently replaces the 32 bit interface.
|
|
@end deftypefun
|
|
|
|
@cindex writing to a file descriptor
|
|
@comment unistd.h
|
|
@comment POSIX.1
|
|
@deftypefun ssize_t write (int @var{filedes}, const void *@var{buffer}, size_t @var{size})
|
|
The @code{write} function writes up to @var{size} bytes from
|
|
@var{buffer} to the file with descriptor @var{filedes}. The data in
|
|
@var{buffer} is not necessarily a character string and a null character is
|
|
output like any other character.
|
|
|
|
The return value is the number of bytes actually written. This may be
|
|
@var{size}, but can always be smaller. Your program should always call
|
|
@code{write} in a loop, iterating until all the data is written.
|
|
|
|
Once @code{write} returns, the data is enqueued to be written and can be
|
|
read back right away, but it is not necessarily written out to permanent
|
|
storage immediately. You can use @code{fsync} when you need to be sure
|
|
your data has been permanently stored before continuing. (It is more
|
|
efficient for the system to batch up consecutive writes and do them all
|
|
at once when convenient. Normally they will always be written to disk
|
|
within a minute or less.) Modern systems provide another function
|
|
@code{fdatasync} which guarantees integrity only for the file data and
|
|
is therefore faster.
|
|
@c !!! xref fsync, fdatasync
|
|
You can use the @code{O_FSYNC} open mode to make @code{write} always
|
|
store the data to disk before returning; @pxref{Operating Modes}.
|
|
|
|
In the case of an error, @code{write} returns @math{-1}. The following
|
|
@code{errno} error conditions are defined for this function:
|
|
|
|
@table @code
|
|
@item EAGAIN
|
|
Normally, @code{write} blocks until the write operation is complete.
|
|
But if the @code{O_NONBLOCK} flag is set for the file (@pxref{Control
|
|
Operations}), it returns immediately without writing any data and
|
|
reports this error. An example of a situation that might cause the
|
|
process to block on output is writing to a terminal device that supports
|
|
flow control, where output has been suspended by receipt of a STOP
|
|
character.
|
|
|
|
@strong{Compatibility Note:} Most versions of BSD Unix use a different
|
|
error code for this: @code{EWOULDBLOCK}. In the GNU library,
|
|
@code{EWOULDBLOCK} is an alias for @code{EAGAIN}, so it doesn't matter
|
|
which name you use.
|
|
|
|
On some systems, writing a large amount of data from a character special
|
|
file can also fail with @code{EAGAIN} if the kernel cannot find enough
|
|
physical memory to lock down the user's pages. This is limited to
|
|
devices that transfer with direct memory access into the user's memory,
|
|
which means it does not include terminals, since they always use
|
|
separate buffers inside the kernel. This problem does not arise in the
|
|
GNU system.
|
|
|
|
@item EBADF
|
|
The @var{filedes} argument is not a valid file descriptor,
|
|
or is not open for writing.
|
|
|
|
@item EFBIG
|
|
The size of the file would become larger than the implementation can support.
|
|
|
|
@item EINTR
|
|
The @code{write} operation was interrupted by a signal while it was
|
|
blocked waiting for completion. A signal will not necessarily cause
|
|
@code{write} to return @code{EINTR}; it may instead result in a
|
|
successful @code{write} which writes fewer bytes than requested.
|
|
@xref{Interrupted Primitives}.
|
|
|
|
@item EIO
|
|
For many devices, and for disk files, this error code indicates
|
|
a hardware error.
|
|
|
|
@item ENOSPC
|
|
The device containing the file is full.
|
|
|
|
@item EPIPE
|
|
This error is returned when you try to write to a pipe or FIFO that
|
|
isn't open for reading by any process. When this happens, a @code{SIGPIPE}
|
|
signal is also sent to the process; see @ref{Signal Handling}.
|
|
@end table
|
|
|
|
Unless you have arranged to prevent @code{EINTR} failures, you should
|
|
check @code{errno} after each failing call to @code{write}, and if the
|
|
error was @code{EINTR}, you should simply repeat the call.
|
|
@xref{Interrupted Primitives}. The easy way to do this is with the
|
|
macro @code{TEMP_FAILURE_RETRY}, as follows:
|
|
|
|
@smallexample
|
|
nbytes = TEMP_FAILURE_RETRY (write (desc, buffer, count));
|
|
@end smallexample
|
|
|
|
Please note that there is no function named @code{write64}. This is not
|
|
necessary since this function does not directly modify or handle the
|
|
possibly wide file offset. Since the kernel handles this state
|
|
internally the @code{write} function can be used for all cases.
|
|
|
|
This function is a cancellation point in multi-threaded programs. This
|
|
is a problem if the thread allocates some resources (like memory, file
|
|
descriptors, semaphores or whatever) at the time @code{write} is
|
|
called. If the thread gets cancelled these resources stay allocated
|
|
until the program ends. To avoid this, calls to @code{write} should be
|
|
protected using cancellation handlers.
|
|
@c ref pthread_cleanup_push / pthread_cleanup_pop
|
|
|
|
The @code{write} function is the underlying primitive for all of the
|
|
functions that write to streams, such as @code{fputc}.
|
|
@end deftypefun
|
|
|
|
@comment unistd.h
|
|
@comment Unix98
|
|
@deftypefun ssize_t pwrite (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off_t @var{offset})
|
|
The @code{pwrite} function is similar to the @code{write} function. The
|
|
first three arguments are identical, and the return values and error codes
|
|
also correspond.
|
|
|
|
The difference is the fourth argument and its handling. The data block
|
|
is not written to the current position of the file descriptor
|
|
@code{filedes}. Instead the data is written to the file starting at
|
|
position @var{offset}. The position of the file descriptor itself is
|
|
not affected by the operation. The value is the same as before the call.
|
|
|
|
When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the
|
|
@code{pwrite} function is in fact @code{pwrite64} and the type
|
|
@code{off_t} has 64 bits, which makes it possible to handle files up to
|
|
@math{2^63} bytes in length.
|
|
|
|
The return value of @code{pwrite} describes the number of written bytes.
|
|
In the error case it returns @math{-1} like @code{write} does and the
|
|
error codes are also the same, with these additions:
|
|
|
|
@table @code
|
|
@item EINVAL
|
|
The value given for @var{offset} is negative and therefore illegal.
|
|
|
|
@item ESPIPE
|
|
The file descriptor @var{filedes} is associated with a pipe or a FIFO and
|
|
this device does not allow positioning of the file pointer.
|
|
@end table
|
|
|
|
The function is an extension defined in the Unix Single Specification
|
|
version 2.
|
|
@end deftypefun
|
|
|
|
@comment unistd.h
|
|
@comment Unix98
|
|
@deftypefun ssize_t pwrite64 (int @var{filedes}, const void *@var{buffer}, size_t @var{size}, off64_t @var{offset})
|
|
This function is similar to the @code{pwrite} function. The difference
|
|
is that the @var{offset} parameter is of type @code{off64_t} instead of
|
|
@code{off_t} which makes it possible on 32 bit machines to address
|
|
files larger than @math{2^31} bytes and up to @math{2^63} bytes. The
|
|
file descriptor @code{filedes} must be opened using @code{open64} since
|
|
otherwise the large offsets possible with @code{off64_t} will lead to
|
|
errors with a descriptor in small file mode.
|
|
|
|
When the source file is compiled using @code{_FILE_OFFSET_BITS == 64} on a
|
|
32 bit machine this function is actually available under the name
|
|
@code{pwrite} and so transparently replaces the 32 bit interface.
|
|
@end deftypefun
|
|
|
|
|
|
@node File Position Primitive
|
|
@section Setting the File Position of a Descriptor
|
|
|
|
Just as you can set the file position of a stream with @code{fseek}, you
|
|
can set the file position of a descriptor with @code{lseek}. This
|
|
specifies the position in the file for the next @code{read} or
|
|
@code{write} operation. @xref{File Positioning}, for more information
|
|
on the file position and what it means.
|
|
|
|
To read the current file position value from a descriptor, use
|
|
@code{lseek (@var{desc}, 0, SEEK_CUR)}.
|
|
|
|
@cindex file positioning on a file descriptor
|
|
@cindex positioning a file descriptor
|
|
@cindex seeking on a file descriptor
|
|
@comment unistd.h
|
|
@comment POSIX.1
|
|
@deftypefun off_t lseek (int @var{filedes}, off_t @var{offset}, int @var{whence})
|
|
The @code{lseek} function is used to change the file position of the
|
|
file with descriptor @var{filedes}.
|
|
|
|
The @var{whence} argument specifies how the @var{offset} should be
|
|
interpreted, in the same way as for the @code{fseek} function, and it must
|
|
be one of the symbolic constants @code{SEEK_SET}, @code{SEEK_CUR}, or
|
|
@code{SEEK_END}.
|
|
|
|
@table @code
|
|
@item SEEK_SET
|
|
Specifies that @var{whence} is a count of characters from the beginning
|
|
of the file.
|
|
|
|
@item SEEK_CUR
|
|
Specifies that @var{whence} is a count of characters from the current
|
|
file position. This count may be positive or negative.
|
|
|
|
@item SEEK_END
|
|
Specifies that @var{whence} is a count of characters from the end of
|
|
the file. A negative count specifies a position within the current
|
|
extent of the file; a positive count specifies a position past the
|
|
current end. If you set the position past the current end, and
|
|
actually write data, you will extend the file with zeros up to that
|
|
position.
|
|
@end table
|
|
|
|
The return value from @code{lseek} is normally the resulting file
|
|
position, measured in bytes from the beginning of the file.
|
|
You can use this feature together with @code{SEEK_CUR} to read the
|
|
current file position.
|
|
|
|
If you want to append to the file, setting the file position to the
|
|
current end of file with @code{SEEK_END} is not sufficient. Another
|
|
process may write more data after you seek but before you write,
|
|
extending the file so the position you write onto clobbers their data.
|
|
Instead, use the @code{O_APPEND} operating mode; @pxref{Operating Modes}.
|
|
|
|
You can set the file position past the current end of the file. This
|
|
does not by itself make the file longer; @code{lseek} never changes the
|
|
file. But subsequent output at that position will extend the file.
|
|
Characters between the previous end of file and the new position are
|
|
filled with zeros. Extending the file in this way can create a
|
|
``hole'': the blocks of zeros are not actually allocated on disk, so the
|
|
file takes up less space than it appears to; it is then called a
|
|
``sparse file''.
|
|
@cindex sparse files
|
|
@cindex holes in files
|
|
|
|
If the file position cannot be changed, or the operation is in some way
|
|
invalid, @code{lseek} returns a value of @math{-1}. The following
|
|
@code{errno} error conditions are defined for this function:
|
|
|
|
@table @code
|
|
@item EBADF
|
|
The @var{filedes} is not a valid file descriptor.
|
|
|
|
@item EINVAL
|
|
The @var{whence} argument value is not valid, or the resulting
|
|
file offset is not valid. A file offset is invalid.
|
|
|
|
@item ESPIPE
|
|
The @var{filedes} corresponds to an object that cannot be positioned,
|
|
such as a pipe, FIFO or terminal device. (POSIX.1 specifies this error
|
|
only for pipes and FIFOs, but in the GNU system, you always get
|
|
@code{ESPIPE} if the object is not seekable.)
|
|
@end table
|
|
|
|
When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} the
|
|
@code{lseek} function is in fact @code{lseek64} and the type
|
|
@code{off_t} has 64 bits which makes it possible to handle files up to
|
|
@math{2^63} bytes in length.
|
|
|
|
This function is a cancellation point in multi-threaded programs. This
|
|
is a problem if the thread allocates some resources (like memory, file
|
|
descriptors, semaphores or whatever) at the time @code{lseek} is
|
|
called. If the thread gets cancelled these resources stay allocated
|
|
until the program ends. To avoid this calls to @code{lseek} should be
|
|
protected using cancellation handlers.
|
|
@c ref pthread_cleanup_push / pthread_cleanup_pop
|
|
|
|
The @code{lseek} function is the underlying primitive for the
|
|
@code{fseek}, @code{fseeko}, @code{ftell}, @code{ftello} and
|
|
@code{rewind} functions, which operate on streams instead of file
|
|
descriptors.
|
|
@end deftypefun
|
|
|
|
@comment unistd.h
|
|
@comment Unix98
|
|
@deftypefun off64_t lseek64 (int @var{filedes}, off64_t @var{offset}, int @var{whence})
|
|
This function is similar to the @code{lseek} function. The difference
|
|
is that the @var{offset} parameter is of type @code{off64_t} instead of
|
|
@code{off_t} which makes it possible on 32 bit machines to address
|
|
files larger than @math{2^31} bytes and up to @math{2^63} bytes. The
|
|
file descriptor @code{filedes} must be opened using @code{open64} since
|
|
otherwise the large offsets possible with @code{off64_t} will lead to
|
|
errors with a descriptor in small file mode.
|
|
|
|
When the source file is compiled with @code{_FILE_OFFSET_BITS == 64} on a
|
|
32 bits machine this function is actually available under the name
|
|
@code{lseek} and so transparently replaces the 32 bit interface.
|
|
@end deftypefun
|
|
|
|
You can have multiple descriptors for the same file if you open the file
|
|
more than once, or if you duplicate a descriptor with @code{dup}.
|
|
Descriptors that come from separate calls to @code{open} have independent
|
|
file positions; using @code{lseek} on one descriptor has no effect on the
|
|
other. For example,
|
|
|
|
@smallexample
|
|
@group
|
|
@{
|
|
int d1, d2;
|
|
char buf[4];
|
|
d1 = open ("foo", O_RDONLY);
|
|
d2 = open ("foo", O_RDONLY);
|
|
lseek (d1, 1024, SEEK_SET);
|
|
read (d2, buf, 4);
|
|
@}
|
|
@end group
|
|
@end smallexample
|
|
|
|
@noindent
|
|
will read the first four characters of the file @file{foo}. (The
|
|
error-checking code necessary for a real program has been omitted here
|
|
for brevity.)
|
|
|
|
By contrast, descriptors made by duplication share a common file
|
|
position with the original descriptor that was duplicated. Anything
|
|
which alters the file position of one of the duplicates, including
|
|
reading or writing data, affects all of them alike. Thus, for example,
|
|
|
|
@smallexample
|
|
@{
|
|
int d1, d2, d3;
|
|
char buf1[4], buf2[4];
|
|
d1 = open ("foo", O_RDONLY);
|
|
d2 = dup (d1);
|
|
d3 = dup (d2);
|
|
lseek (d3, 1024, SEEK_SET);
|
|
read (d1, buf1, 4);
|
|
read (d2, buf2, 4);
|
|
@}
|
|
@end smallexample
|
|
|
|
@noindent
|
|
will read four characters starting with the 1024'th character of
|
|
@file{foo}, and then four more characters starting with the 1028'th
|
|
character.
|
|
|
|
@comment sys/types.h
|
|
@comment POSIX.1
|
|
@deftp {Data Type} off_t
|
|
This is an arithmetic data type used to represent file sizes.
|
|
In the GNU system, this is equivalent to @code{fpos_t} or @code{long int}.
|
|
|
|
If the source is compiled with @code{_FILE_OFFSET_BITS == 64} this type
|
|
is transparently replaced by @code{off64_t}.
|
|
@end deftp
|
|
|
|
@comment sys/types.h
|
|
@comment Unix98
|
|
@deftp {Data Type} off64_t
|
|
This type is used similar to @code{off_t}. The difference is that even
|
|
on 32 bit machines, where the @code{off_t} type would have 32 bits,
|
|
@code{off64_t} has 64 bits and so is able to address files up to
|
|
@math{2^63} bytes in length.
|
|
|
|
When compiling with @code{_FILE_OFFSET_BITS == 64} this type is
|
|
available under the name @code{off_t}.
|
|
@end deftp
|
|
|
|
These aliases for the @samp{SEEK_@dots{}} constants exist for the sake
|
|
of compatibility with older BSD systems. They are defined in two
|
|
different header files: @file{fcntl.h} and @file{sys/file.h}.
|
|
|
|
@table @code
|
|
@item L_SET
|
|
An alias for @code{SEEK_SET}.
|
|
|
|
@item L_INCR
|
|
An alias for @code{SEEK_CUR}.
|
|
|
|
@item L_XTND
|
|
An alias for @code{SEEK_END}.
|
|
@end table
|
|
|
|
@node Descriptors and Streams
|
|
@section Descriptors and Streams
|
|
@cindex streams, and file descriptors
|
|
@cindex converting file descriptor to stream
|
|
@cindex extracting file descriptor from stream
|
|
|
|
Given an open file descriptor, you can create a stream for it with the
|
|
@code{fdopen} function. You can get the underlying file descriptor for
|
|
an existing stream with the @code{fileno} function. These functions are
|
|
declared in the header file @file{stdio.h}.
|
|
@pindex stdio.h
|
|
|
|
@comment stdio.h
|
|
@comment POSIX.1
|
|
@deftypefun {FILE *} fdopen (int @var{filedes}, const char *@var{opentype})
|
|
The @code{fdopen} function returns a new stream for the file descriptor
|
|
@var{filedes}.
|
|
|
|
The @var{opentype} argument is interpreted in the same way as for the
|
|
@code{fopen} function (@pxref{Opening Streams}), except that
|
|
the @samp{b} option is not permitted; this is because GNU makes no
|
|
distinction between text and binary files. Also, @code{"w"} and
|
|
@code{"w+"} do not cause truncation of the file; these have an effect only
|
|
when opening a file, and in this case the file has already been opened.
|
|
You must make sure that the @var{opentype} argument matches the actual
|
|
mode of the open file descriptor.
|
|
|
|
The return value is the new stream. If the stream cannot be created
|
|
(for example, if the modes for the file indicated by the file descriptor
|
|
do not permit the access specified by the @var{opentype} argument), a
|
|
null pointer is returned instead.
|
|
|
|
In some other systems, @code{fdopen} may fail to detect that the modes
|
|
for file descriptor do not permit the access specified by
|
|
@code{opentype}. The GNU C library always checks for this.
|
|
@end deftypefun
|
|
|
|
For an example showing the use of the @code{fdopen} function,
|
|
see @ref{Creating a Pipe}.
|
|
|
|
@comment stdio.h
|
|
@comment POSIX.1
|
|
@deftypefun int fileno (FILE *@var{stream})
|
|
This function returns the file descriptor associated with the stream
|
|
@var{stream}. If an error is detected (for example, if the @var{stream}
|
|
is not valid) or if @var{stream} does not do I/O to a file,
|
|
@code{fileno} returns @math{-1}.
|
|
@end deftypefun
|
|
|
|
@comment stdio.h
|
|
@comment GNU
|
|
@deftypefun int fileno_unlocked (FILE *@var{stream})
|
|
The @code{fileno_unlocked} function is equivalent to the @code{fileno}
|
|
function except that it does not implicitly lock the stream if the state
|
|
is @code{FSETLOCKING_INTERNAL}.
|
|
|
|
This function is a GNU extension.
|
|
@end deftypefun
|
|
|
|
@cindex standard file descriptors
|
|
@cindex file descriptors, standard
|
|
There are also symbolic constants defined in @file{unistd.h} for the
|
|
file descriptors belonging to the standard streams @code{stdin},
|
|
@code{stdout}, and @code{stderr}; see @ref{Standard Streams}.
|
|
@pindex unistd.h
|
|
|
|
@comment unistd.h
|
|
@comment POSIX.1
|
|
@table @code
|
|
@item STDIN_FILENO
|
|
@vindex STDIN_FILENO
|
|
This macro has value @code{0}, which is the file descriptor for
|
|
standard input.
|
|
@cindex standard input file descriptor
|
|
|
|
@comment unistd.h
|
|
@comment POSIX.1
|
|
@item STDOUT_FILENO
|
|
@vindex STDOUT_FILENO
|
|
This macro has value @code{1}, which is the file descriptor for
|
|
standard output.
|
|
@cindex standard output file descriptor
|
|
|
|
@comment unistd.h
|
|
@comment POSIX.1
|
|
@item STDERR_FILENO
|
|
@vindex STDERR_FILENO
|
|
This macro has value @code{2}, which is the file descriptor for
|
|
standard error output.
|
|
@end table
|
|
@cindex standard error file descriptor
|
|
|
|
@node Stream/Descriptor Precautions
|
|
@section Dangers of Mixing Streams and Descriptors
|
|
@cindex channels
|
|
@cindex streams and descriptors
|
|
@cindex descriptors and streams
|
|
@cindex mixing descriptors and streams
|
|
|
|
You can have multiple file descriptors and streams (let's call both
|
|
streams and descriptors ``channels'' for short) connected to the same
|
|
file, but you must take care to avoid confusion between channels. There
|
|
are two cases to consider: @dfn{linked} channels that share a single
|
|
file position value, and @dfn{independent} channels that have their own
|
|
file positions.
|
|
|
|
It's best to use just one channel in your program for actual data
|
|
transfer to any given file, except when all the access is for input.
|
|
For example, if you open a pipe (something you can only do at the file
|
|
descriptor level), either do all I/O with the descriptor, or construct a
|
|
stream from the descriptor with @code{fdopen} and then do all I/O with
|
|
the stream.
|
|
|
|
@menu
|
|
* Linked Channels:: Dealing with channels sharing a file position.
|
|
* Independent Channels:: Dealing with separately opened, unlinked channels.
|
|
* Cleaning Streams:: Cleaning a stream makes it safe to use
|
|
another channel.
|
|
@end menu
|
|
|
|
@node Linked Channels
|
|
@subsection Linked Channels
|
|
@cindex linked channels
|
|
|
|
Channels that come from a single opening share the same file position;
|
|
we call them @dfn{linked} channels. Linked channels result when you
|
|
make a stream from a descriptor using @code{fdopen}, when you get a
|
|
descriptor from a stream with @code{fileno}, when you copy a descriptor
|
|
with @code{dup} or @code{dup2}, and when descriptors are inherited
|
|
during @code{fork}. For files that don't support random access, such as
|
|
terminals and pipes, @emph{all} channels are effectively linked. On
|
|
random-access files, all append-type output streams are effectively
|
|
linked to each other.
|
|
|
|
@cindex cleaning up a stream
|
|
If you have been using a stream for I/O, and you want to do I/O using
|
|
another channel (either a stream or a descriptor) that is linked to it,
|
|
you must first @dfn{clean up} the stream that you have been using.
|
|
@xref{Cleaning Streams}.
|
|
|
|
Terminating a process, or executing a new program in the process,
|
|
destroys all the streams in the process. If descriptors linked to these
|
|
streams persist in other processes, their file positions become
|
|
undefined as a result. To prevent this, you must clean up the streams
|
|
before destroying them.
|
|
|
|
@node Independent Channels
|
|
@subsection Independent Channels
|
|
@cindex independent channels
|
|
|
|
When you open channels (streams or descriptors) separately on a seekable
|
|
file, each channel has its own file position. These are called
|
|
@dfn{independent channels}.
|
|
|
|
The system handles each channel independently. Most of the time, this
|
|
is quite predictable and natural (especially for input): each channel
|
|
can read or write sequentially at its own place in the file. However,
|
|
if some of the channels are streams, you must take these precautions:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
You should clean an output stream after use, before doing anything else
|
|
that might read or write from the same part of the file.
|
|
|
|
@item
|
|
You should clean an input stream before reading data that may have been
|
|
modified using an independent channel. Otherwise, you might read
|
|
obsolete data that had been in the stream's buffer.
|
|
@end itemize
|
|
|
|
If you do output to one channel at the end of the file, this will
|
|
certainly leave the other independent channels positioned somewhere
|
|
before the new end. You cannot reliably set their file positions to the
|
|
new end of file before writing, because the file can always be extended
|
|
by another process between when you set the file position and when you
|
|
write the data. Instead, use an append-type descriptor or stream; they
|
|
always output at the current end of the file. In order to make the
|
|
end-of-file position accurate, you must clean the output channel you
|
|
were using, if it is a stream.
|
|
|
|
It's impossible for two channels to have separate file pointers for a
|
|
file that doesn't support random access. Thus, channels for reading or
|
|
writing such files are always linked, never independent. Append-type
|
|
channels are also always linked. For these channels, follow the rules
|
|
for linked channels; see @ref{Linked Channels}.
|
|
|
|
@node Cleaning Streams
|
|
@subsection Cleaning Streams
|
|
|
|
On the GNU system, you can clean up any stream with @code{fclean}:
|
|
|
|
@comment stdio.h
|
|
@comment GNU
|
|
@deftypefun int fclean (FILE *@var{stream})
|
|
Clean up the stream @var{stream} so that its buffer is empty. If
|
|
@var{stream} is doing output, force it out. If @var{stream} is doing
|
|
input, give the data in the buffer back to the system, arranging to
|
|
reread it.
|
|
@end deftypefun
|
|
|
|
On other systems, you can use @code{fflush} to clean a stream in most
|
|
cases.
|
|
|
|
You can skip the @code{fclean} or @code{fflush} if you know the stream
|
|
is already clean. A stream is clean whenever its buffer is empty. For
|
|
example, an unbuffered stream is always clean. An input stream that is
|
|
at end-of-file is clean. A line-buffered stream is clean when the last
|
|
character output was a newline.
|
|
|
|
There is one case in which cleaning a stream is impossible on most
|
|
systems. This is when the stream is doing input from a file that is not
|
|
random-access. Such streams typically read ahead, and when the file is
|
|
not random access, there is no way to give back the excess data already
|
|
read. When an input stream reads from a random-access file,
|
|
@code{fflush} does clean the stream, but leaves the file pointer at an
|
|
unpredictable place; you must set the file pointer before doing any
|
|
further I/O. On the GNU system, using @code{fclean} avoids both of
|
|
these problems.
|
|
|
|
Closing an output-only stream also does @code{fflush}, so this is a
|
|
valid way of cleaning an output stream. On the GNU system, closing an
|
|
input stream does @code{fclean}.
|
|
|
|
You need not clean a stream before using its descriptor for control
|
|
operations such as setting terminal modes; these operations don't affect
|
|
the file position and are not affected by it. You can use any
|
|
descriptor for these operations, and all channels are affected
|
|
simultaneously. However, text already ``output'' to a stream but still
|
|
buffered by the stream will be subject to the new terminal modes when
|
|
subsequently flushed. To make sure ``past'' output is covered by the
|
|
terminal settings that were in effect at the time, flush the output
|
|
streams for that terminal before setting the modes. @xref{Terminal
|
|
Modes}.
|
|
|
|
@node Scatter-Gather
|
|
@section Fast Scatter-Gather I/O
|
|
@cindex scatter-gather
|
|
|
|
Some applications may need to read or write data to multiple buffers,
|
|
which are separated in memory. Although this can be done easily enough
|
|
with multiple calls to @code{read} and @code{write}, it is inefficent
|
|
because there is overhead associated with each kernel call.
|
|
|
|
Instead, many platforms provide special high-speed primitives to perform
|
|
these @dfn{scatter-gather} operations in a single kernel call. The GNU C
|
|
library will provide an emulation on any system that lacks these
|
|
primitives, so they are not a portability threat. They are defined in
|
|
@code{sys/uio.h}.
|
|
|
|
These functions are controlled with arrays of @code{iovec} structures,
|
|
which describe the location and size of each buffer.
|
|
|
|
@deftp {Data Type} {struct iovec}
|
|
|
|
The @code{iovec} structure describes a buffer. It contains two fields:
|
|
|
|
@table @code
|
|
|
|
@item void *iov_base
|
|
Contains the address of a buffer.
|
|
|
|
@item size_t iov_len
|
|
Contains the length of the buffer.
|
|
|
|
@end table
|
|
@end deftp
|
|
|
|
@deftypefun ssize_t readv (int @var{filedes}, const struct iovec *@var{vector}, int @var{count})
|
|
|
|
The @code{readv} function reads data from @var{filedes} and scatters it
|
|
into the buffers described in @var{vector}, which is taken to be
|
|
@var{count} structures long. As each buffer is filled, data is sent to the
|
|
next.
|
|
|
|
Note that @code{readv} is not guaranteed to fill all the buffers.
|
|
It may stop at any point, for the same reasons @code{read} would.
|
|
|
|
The return value is a count of bytes (@emph{not} buffers) read, @math{0}
|
|
indicating end-of-file, or @math{-1} indicating an error. The possible
|
|
errors are the same as in @code{read}.
|
|
|
|
@end deftypefun
|
|
|
|
@deftypefun ssize_t writev (int @var{filedes}, const struct iovec *@var{vector}, int @var{count})
|
|
|
|
The @code{writev} function gathers data from the buffers described in
|
|
@var{vector}, which is taken to be @var{count} structures long, and writes
|
|
them to @code{filedes}. As each buffer is written, it moves on to the
|
|
next.
|
|
|
|
Like @code{readv}, @code{writev} may stop midstream under the same
|
|
conditions @code{write} would.
|
|
|
|
The return value is a count of bytes written, or @math{-1} indicating an
|
|
error. The possible errors are the same as in @code{write}.
|
|
|
|
@end deftypefun
|
|
|
|
@c Note - I haven't read this anywhere. I surmised it from my knowledge
|
|
@c of computer science. Thus, there could be subtleties I'm missing.
|
|
|
|
Note that if the buffers are small (under about 1kB), high-level streams
|
|
may be easier to use than these functions. However, @code{readv} and
|
|
@code{writev} are more efficient when the individual buffers themselves
|
|
(as opposed to the total output), are large. In that case, a high-level
|
|
stream would not be able to cache the data effectively.
|
|
|
|
@node Memory-mapped I/O
|
|
@section Memory-mapped I/O
|
|
|
|
On modern operating systems, it is possible to @dfn{mmap} (pronounced
|
|
``em-map'') a file to a region of memory. When this is done, the file can
|
|
be accessed just like an array in the program.
|
|
|
|
This is more efficent than @code{read} or @code{write}, as only the regions
|
|
of the file that a program actually accesses are loaded. Accesses to
|
|
not-yet-loaded parts of the mmapped region are handled in the same way as
|
|
swapped out pages.
|
|
|
|
Since mmapped pages can be stored back to their file when physical
|
|
memory is low, it is possible to mmap files orders of magnitude larger
|
|
than both the physical memory @emph{and} swap space. The only limit is
|
|
address space. The theoretical limit is 4GB on a 32-bit machine -
|
|
however, the actual limit will be smaller since some areas will be
|
|
reserved for other purposes. If the LFS interface is used the file size
|
|
on 32-bit systems is not limited to 2GB (offsets are signed which
|
|
reduces the addressable area of 4GB by half); the full 64-bit are
|
|
available.
|
|
|
|
Memory mapping only works on entire pages of memory. Thus, addresses
|
|
for mapping must be page-aligned, and length values will be rounded up.
|
|
To determine the size of a page the machine uses one should use
|
|
|
|
@vindex _SC_PAGESIZE
|
|
@smallexample
|
|
size_t page_size = (size_t) sysconf (_SC_PAGESIZE);
|
|
@end smallexample
|
|
|
|
@noindent
|
|
These functions are declared in @file{sys/mman.h}.
|
|
|
|
@deftypefun {void *} mmap (void *@var{address}, size_t @var{length},int @var{protect}, int @var{flags}, int @var{filedes}, off_t @var{offset})
|
|
|
|
The @code{mmap} function creates a new mapping, connected to bytes
|
|
(@var{offset}) to (@var{offset} + @var{length}) in the file open on
|
|
@var{filedes}.
|
|
|
|
@var{address} gives a preferred starting address for the mapping.
|
|
@code{NULL} expresses no preference. Any previous mapping at that
|
|
address is automatically removed. The address you give may still be
|
|
changed, unless you use the @code{MAP_FIXED} flag.
|
|
|
|
@vindex PROT_READ
|
|
@vindex PROT_WRITE
|
|
@vindex PROT_EXEC
|
|
@var{protect} contains flags that control what kind of access is
|
|
permitted. They include @code{PROT_READ}, @code{PROT_WRITE}, and
|
|
@code{PROT_EXEC}, which permit reading, writing, and execution,
|
|
respectively. Inappropriate access will cause a segfault (@pxref{Program
|
|
Error Signals}).
|
|
|
|
Note that most hardware designs cannot support write permission without
|
|
read permission, and many do not distinguish read and execute permission.
|
|
Thus, you may receive wider permissions than you ask for, and mappings of
|
|
write-only files may be denied even if you do not use @code{PROT_READ}.
|
|
|
|
@var{flags} contains flags that control the nature of the map.
|
|
One of @code{MAP_SHARED} or @code{MAP_PRIVATE} must be specified.
|
|
|
|
They include:
|
|
|
|
@vtable @code
|
|
@item MAP_PRIVATE
|
|
This specifies that writes to the region should never be written back
|
|
to the attached file. Instead, a copy is made for the process, and the
|
|
region will be swapped normally if memory runs low. No other process will
|
|
see the changes.
|
|
|
|
Since private mappings effectively revert to ordinary memory
|
|
when written to, you must have enough virtual memory for a copy of
|
|
the entire mmapped region if you use this mode with @code{PROT_WRITE}.
|
|
|
|
@item MAP_SHARED
|
|
This specifies that writes to the region will be written back to the
|
|
file. Changes made will be shared immediately with other processes
|
|
mmaping the same file.
|
|
|
|
Note that actual writing may take place at any time. You need to use
|
|
@code{msync}, described below, if it is important that other processes
|
|
using conventional I/O get a consistent view of the file.
|
|
|
|
@item MAP_FIXED
|
|
This forces the system to use the exact mapping address specified in
|
|
@var{address} and fail if it can't.
|
|
|
|
@c One of these is official - the other is obviously an obsolete synonym
|
|
@c Which is which?
|
|
@item MAP_ANONYMOUS
|
|
@itemx MAP_ANON
|
|
This flag tells the system to create an anonymous mapping, not connected
|
|
to a file. @var{filedes} and @var{off} are ignored, and the region is
|
|
initialized with zeros.
|
|
|
|
Anonymous maps are used as the basic primitive to extend the heap on some
|
|
systems. They are also useful to share data between multiple tasks
|
|
without creating a file.
|
|
|
|
On some systems using private anonymous mmaps is more efficient than using
|
|
@code{malloc} for large blocks. This is not an issue with the GNU C library,
|
|
as the included @code{malloc} automatically uses @code{mmap} where appropriate.
|
|
|
|
@c Linux has some other MAP_ options, which I have not discussed here.
|
|
@c MAP_DENYWRITE, MAP_EXECUTABLE and MAP_GROWSDOWN don't seem applicable to
|
|
@c user programs (and I don't understand the last two). MAP_LOCKED does
|
|
@c not appear to be implemented.
|
|
|
|
@end vtable
|
|
|
|
@code{mmap} returns the address of the new mapping, or @math{-1} for an
|
|
error.
|
|
|
|
Possible errors include:
|
|
|
|
@table @code
|
|
|
|
@item EINVAL
|
|
|
|
Either @var{address} was unusable, or inconsistent @var{flags} were
|
|
given.
|
|
|
|
@item EACCES
|
|
|
|
@var{filedes} was not open for the type of access specified in @var{protect}.
|
|
|
|
@item ENOMEM
|
|
|
|
Either there is not enough memory for the operation, or the process is
|
|
out of address space.
|
|
|
|
@item ENODEV
|
|
|
|
This file is of a type that doesn't support mapping.
|
|
|
|
@item ENOEXEC
|
|
|
|
The file is on a filesystem that doesn't support mapping.
|
|
|
|
@c On Linux, EAGAIN will appear if the file has a conflicting mandatory lock.
|
|
@c However mandatory locks are not discussed in this manual.
|
|
@c
|
|
@c Similarly, ETXTBSY will occur if the MAP_DENYWRITE flag (not documented
|
|
@c here) is used and the file is already open for writing.
|
|
|
|
@end table
|
|
|
|
@end deftypefun
|
|
|
|
@deftypefun {void *} mmap64 (void *@var{address}, size_t @var{length},int @var{protect}, int @var{flags}, int @var{filedes}, off64_t @var{offset})
|
|
The @code{mmap64} function is equivalent to the @code{mmap} function but
|
|
the @var{offset} parameter is of type @code{off64_t}. On 32-bit systems
|
|
this allows the file associated with the @var{filedes} descriptor to be
|
|
larger than 2GB. @var{filedes} must be a descriptor returned from a
|
|
call to @code{open64} or @code{fopen64} and @code{freopen64} where the
|
|
descriptor is retrieved with @code{fileno}.
|
|
|
|
When the sources are translated with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is actually available under the name @code{mmap}. I.e., the
|
|
new, extended API using 64 bit file sizes and offsets transparently
|
|
replaces the old API.
|
|
@end deftypefun
|
|
|
|
@deftypefun int munmap (void *@var{addr}, size_t @var{length})
|
|
|
|
@code{munmap} removes any memory maps from (@var{addr}) to (@var{addr} +
|
|
@var{length}). @var{length} should be the length of the mapping.
|
|
|
|
It is safe to unmap multiple mappings in one command, or include unmapped
|
|
space in the range. It is also possible to unmap only part of an existing
|
|
mapping. However, only entire pages can be removed. If @var{length} is not
|
|
an even number of pages, it will be rounded up.
|
|
|
|
It returns @math{0} for success and @math{-1} for an error.
|
|
|
|
One error is possible:
|
|
|
|
@table @code
|
|
|
|
@item EINVAL
|
|
The memory range given was outside the user mmap range or wasn't page
|
|
aligned.
|
|
|
|
@end table
|
|
|
|
@end deftypefun
|
|
|
|
@deftypefun int msync (void *@var{address}, size_t @var{length}, int @var{flags})
|
|
|
|
When using shared mappings, the kernel can write the file at any time
|
|
before the mapping is removed. To be certain data has actually been
|
|
written to the file and will be accessible to non-memory-mapped I/O, it
|
|
is necessary to use this function.
|
|
|
|
It operates on the region @var{address} to (@var{address} + @var{length}).
|
|
It may be used on part of a mapping or multiple mappings, however the
|
|
region given should not contain any unmapped space.
|
|
|
|
@var{flags} can contain some options:
|
|
|
|
@vtable @code
|
|
|
|
@item MS_SYNC
|
|
|
|
This flag makes sure the data is actually written @emph{to disk}.
|
|
Normally @code{msync} only makes sure that accesses to a file with
|
|
conventional I/O reflect the recent changes.
|
|
|
|
@item MS_ASYNC
|
|
|
|
This tells @code{msync} to begin the synchronization, but not to wait for
|
|
it to complete.
|
|
|
|
@c Linux also has MS_INVALIDATE, which I don't understand.
|
|
|
|
@end vtable
|
|
|
|
@code{msync} returns @math{0} for success and @math{-1} for
|
|
error. Errors include:
|
|
|
|
@table @code
|
|
|
|
@item EINVAL
|
|
An invalid region was given, or the @var{flags} were invalid.
|
|
|
|
@item EFAULT
|
|
There is no existing mapping in at least part of the given region.
|
|
|
|
@end table
|
|
|
|
@end deftypefun
|
|
|
|
@deftypefun {void *} mremap (void *@var{address}, size_t @var{length}, size_t @var{new_length}, int @var{flag})
|
|
|
|
This function can be used to change the size of an existing memory
|
|
area. @var{address} and @var{length} must cover a region entirely mapped
|
|
in the same @code{mmap} statement. A new mapping with the same
|
|
characteristics will be returned with the length @var{new_length}.
|
|
|
|
One option is possible, @code{MREMAP_MAYMOVE}. If it is given in
|
|
@var{flags}, the system may remove the existing mapping and create a new
|
|
one of the desired length in another location.
|
|
|
|
The address of the resulting mapping is returned, or @math{-1}. Possible
|
|
error codes include:
|
|
|
|
@table @code
|
|
|
|
@item EFAULT
|
|
There is no existing mapping in at least part of the original region, or
|
|
the region covers two or more distinct mappings.
|
|
|
|
@item EINVAL
|
|
The address given is misaligned or inappropriate.
|
|
|
|
@item EAGAIN
|
|
The region has pages locked, and if extended it would exceed the
|
|
process's resource limit for locked pages. @xref{Limits on Resources}.
|
|
|
|
@item ENOMEM
|
|
The region is private writeable, and insufficent virtual memory is
|
|
available to extend it. Also, this error will occur if
|
|
@code{MREMAP_MAYMOVE} is not given and the extension would collide with
|
|
another mapped region.
|
|
|
|
@end table
|
|
@end deftypefun
|
|
|
|
This function is only available on a few systems. Except for performing
|
|
optional optimizations one should not rely on this function.
|
|
|
|
Not all file descriptors may be mapped. Sockets, pipes, and most devices
|
|
only allow sequential access and do not fit into the mapping abstraction.
|
|
In addition, some regular files may not be mmapable, and older kernels may
|
|
not support mapping at all. Thus, programs using @code{mmap} should
|
|
have a fallback method to use should it fail. @xref{Mmap,,,standards,GNU
|
|
Coding Standards}.
|
|
|
|
@c XXX madvice documentation missing
|
|
|
|
@node Waiting for I/O
|
|
@section Waiting for Input or Output
|
|
@cindex waiting for input or output
|
|
@cindex multiplexing input
|
|
@cindex input from multiple files
|
|
|
|
Sometimes a program needs to accept input on multiple input channels
|
|
whenever input arrives. For example, some workstations may have devices
|
|
such as a digitizing tablet, function button box, or dial box that are
|
|
connected via normal asynchronous serial interfaces; good user interface
|
|
style requires responding immediately to input on any device. Another
|
|
example is a program that acts as a server to several other processes
|
|
via pipes or sockets.
|
|
|
|
You cannot normally use @code{read} for this purpose, because this
|
|
blocks the program until input is available on one particular file
|
|
descriptor; input on other channels won't wake it up. You could set
|
|
nonblocking mode and poll each file descriptor in turn, but this is very
|
|
inefficient.
|
|
|
|
A better solution is to use the @code{select} function. This blocks the
|
|
program until input or output is ready on a specified set of file
|
|
descriptors, or until a timer expires, whichever comes first. This
|
|
facility is declared in the header file @file{sys/types.h}.
|
|
@pindex sys/types.h
|
|
|
|
In the case of a server socket (@pxref{Listening}), we say that
|
|
``input'' is available when there are pending connections that could be
|
|
accepted (@pxref{Accepting Connections}). @code{accept} for server
|
|
sockets blocks and interacts with @code{select} just as @code{read} does
|
|
for normal input.
|
|
|
|
@cindex file descriptor sets, for @code{select}
|
|
The file descriptor sets for the @code{select} function are specified
|
|
as @code{fd_set} objects. Here is the description of the data type
|
|
and some macros for manipulating these objects.
|
|
|
|
@comment sys/types.h
|
|
@comment BSD
|
|
@deftp {Data Type} fd_set
|
|
The @code{fd_set} data type represents file descriptor sets for the
|
|
@code{select} function. It is actually a bit array.
|
|
@end deftp
|
|
|
|
@comment sys/types.h
|
|
@comment BSD
|
|
@deftypevr Macro int FD_SETSIZE
|
|
The value of this macro is the maximum number of file descriptors that a
|
|
@code{fd_set} object can hold information about. On systems with a
|
|
fixed maximum number, @code{FD_SETSIZE} is at least that number. On
|
|
some systems, including GNU, there is no absolute limit on the number of
|
|
descriptors open, but this macro still has a constant value which
|
|
controls the number of bits in an @code{fd_set}; if you get a file
|
|
descriptor with a value as high as @code{FD_SETSIZE}, you cannot put
|
|
that descriptor into an @code{fd_set}.
|
|
@end deftypevr
|
|
|
|
@comment sys/types.h
|
|
@comment BSD
|
|
@deftypefn Macro void FD_ZERO (fd_set *@var{set})
|
|
This macro initializes the file descriptor set @var{set} to be the
|
|
empty set.
|
|
@end deftypefn
|
|
|
|
@comment sys/types.h
|
|
@comment BSD
|
|
@deftypefn Macro void FD_SET (int @var{filedes}, fd_set *@var{set})
|
|
This macro adds @var{filedes} to the file descriptor set @var{set}.
|
|
@end deftypefn
|
|
|
|
@comment sys/types.h
|
|
@comment BSD
|
|
@deftypefn Macro void FD_CLR (int @var{filedes}, fd_set *@var{set})
|
|
This macro removes @var{filedes} from the file descriptor set @var{set}.
|
|
@end deftypefn
|
|
|
|
@comment sys/types.h
|
|
@comment BSD
|
|
@deftypefn Macro int FD_ISSET (int @var{filedes}, fd_set *@var{set})
|
|
This macro returns a nonzero value (true) if @var{filedes} is a member
|
|
of the file descriptor set @var{set}, and zero (false) otherwise.
|
|
@end deftypefn
|
|
|
|
Next, here is the description of the @code{select} function itself.
|
|
|
|
@comment sys/types.h
|
|
@comment BSD
|
|
@deftypefun int select (int @var{nfds}, fd_set *@var{read-fds}, fd_set *@var{write-fds}, fd_set *@var{except-fds}, struct timeval *@var{timeout})
|
|
The @code{select} function blocks the calling process until there is
|
|
activity on any of the specified sets of file descriptors, or until the
|
|
timeout period has expired.
|
|
|
|
The file descriptors specified by the @var{read-fds} argument are
|
|
checked to see if they are ready for reading; the @var{write-fds} file
|
|
descriptors are checked to see if they are ready for writing; and the
|
|
@var{except-fds} file descriptors are checked for exceptional
|
|
conditions. You can pass a null pointer for any of these arguments if
|
|
you are not interested in checking for that kind of condition.
|
|
|
|
A file descriptor is considered ready for reading if it is not at end of
|
|
file. A server socket is considered ready for reading if there is a
|
|
pending connection which can be accepted with @code{accept};
|
|
@pxref{Accepting Connections}. A client socket is ready for writing when
|
|
its connection is fully established; @pxref{Connecting}.
|
|
|
|
``Exceptional conditions'' does not mean errors---errors are reported
|
|
immediately when an erroneous system call is executed, and do not
|
|
constitute a state of the descriptor. Rather, they include conditions
|
|
such as the presence of an urgent message on a socket. (@xref{Sockets},
|
|
for information on urgent messages.)
|
|
|
|
The @code{select} function checks only the first @var{nfds} file
|
|
descriptors. The usual thing is to pass @code{FD_SETSIZE} as the value
|
|
of this argument.
|
|
|
|
The @var{timeout} specifies the maximum time to wait. If you pass a
|
|
null pointer for this argument, it means to block indefinitely until one
|
|
of the file descriptors is ready. Otherwise, you should provide the
|
|
time in @code{struct timeval} format; see @ref{High-Resolution
|
|
Calendar}. Specify zero as the time (a @code{struct timeval} containing
|
|
all zeros) if you want to find out which descriptors are ready without
|
|
waiting if none are ready.
|
|
|
|
The normal return value from @code{select} is the total number of ready file
|
|
descriptors in all of the sets. Each of the argument sets is overwritten
|
|
with information about the descriptors that are ready for the corresponding
|
|
operation. Thus, to see if a particular descriptor @var{desc} has input,
|
|
use @code{FD_ISSET (@var{desc}, @var{read-fds})} after @code{select} returns.
|
|
|
|
If @code{select} returns because the timeout period expires, it returns
|
|
a value of zero.
|
|
|
|
Any signal will cause @code{select} to return immediately. So if your
|
|
program uses signals, you can't rely on @code{select} to keep waiting
|
|
for the full time specified. If you want to be sure of waiting for a
|
|
particular amount of time, you must check for @code{EINTR} and repeat
|
|
the @code{select} with a newly calculated timeout based on the current
|
|
time. See the example below. See also @ref{Interrupted Primitives}.
|
|
|
|
If an error occurs, @code{select} returns @code{-1} and does not modify
|
|
the argument file descriptor sets. The following @code{errno} error
|
|
conditions are defined for this function:
|
|
|
|
@table @code
|
|
@item EBADF
|
|
One of the file descriptor sets specified an invalid file descriptor.
|
|
|
|
@item EINTR
|
|
The operation was interrupted by a signal. @xref{Interrupted Primitives}.
|
|
|
|
@item EINVAL
|
|
The @var{timeout} argument is invalid; one of the components is negative
|
|
or too large.
|
|
@end table
|
|
@end deftypefun
|
|
|
|
@strong{Portability Note:} The @code{select} function is a BSD Unix
|
|
feature.
|
|
|
|
Here is an example showing how you can use @code{select} to establish a
|
|
timeout period for reading from a file descriptor. The @code{input_timeout}
|
|
function blocks the calling process until input is available on the
|
|
file descriptor, or until the timeout period expires.
|
|
|
|
@smallexample
|
|
@include select.c.texi
|
|
@end smallexample
|
|
|
|
There is another example showing the use of @code{select} to multiplex
|
|
input from multiple sockets in @ref{Server Example}.
|
|
|
|
|
|
@node Synchronizing I/O
|
|
@section Synchronizing I/O operations
|
|
|
|
@cindex synchronizing
|
|
In most modern operating systems the normal I/O operations are not
|
|
executed synchronously. I.e., even if a @code{write} system call
|
|
returns this does not mean the data is actually written to the media,
|
|
e.g., the disk.
|
|
|
|
In situations where synchronization points are necessary,you can use
|
|
special functions which ensure that all operations finish before
|
|
they return.
|
|
|
|
@comment unistd.h
|
|
@comment X/Open
|
|
@deftypefun int sync (void)
|
|
A call to this function will not return as long as there is data which
|
|
has not been written to the device. All dirty buffers in the kernel will
|
|
be written and so an overall consistent system can be achieved (if no
|
|
other process in parallel writes data).
|
|
|
|
A prototype for @code{sync} can be found in @file{unistd.h}.
|
|
|
|
The return value is zero to indicate no error.
|
|
@end deftypefun
|
|
|
|
Programs more often want to ensure that data written to a given file is
|
|
committed, rather than all data in the system. For this, @code{sync} is overkill.
|
|
|
|
|
|
@comment unistd.h
|
|
@comment POSIX
|
|
@deftypefun int fsync (int @var{fildes})
|
|
The @code{fsync} can be used to make sure all data associated with the
|
|
open file @var{fildes} is written to the device associated with the
|
|
descriptor. The function call does not return unless all actions have
|
|
finished.
|
|
|
|
A prototype for @code{fsync} can be found in @file{unistd.h}.
|
|
|
|
This function is a cancellation point in multi-threaded programs. This
|
|
is a problem if the thread allocates some resources (like memory, file
|
|
descriptors, semaphores or whatever) at the time @code{fsync} is
|
|
called. If the thread gets cancelled these resources stay allocated
|
|
until the program ends. To avoid this, calls to @code{fsync} should be
|
|
protected using cancellation handlers.
|
|
@c ref pthread_cleanup_push / pthread_cleanup_pop
|
|
|
|
The return value of the function is zero if no error occurred. Otherwise
|
|
it is @math{-1} and the global variable @var{errno} is set to the
|
|
following values:
|
|
@table @code
|
|
@item EBADF
|
|
The descriptor @var{fildes} is not valid.
|
|
|
|
@item EINVAL
|
|
No synchronization is possible since the system does not implement this.
|
|
@end table
|
|
@end deftypefun
|
|
|
|
Sometimes it is not even necessary to write all data associated with a
|
|
file descriptor. E.g., in database files which do not change in size it
|
|
is enough to write all the file content data to the device.
|
|
Meta-information like the modification time etc. are not that important
|
|
and leaving such information uncommitted does not prevent a successful
|
|
recovering of the file in case of a problem.
|
|
|
|
@comment unistd.h
|
|
@comment POSIX
|
|
@deftypefun int fdatasync (int @var{fildes})
|
|
When a call to the @code{fdatasync} function returns, it is ensured
|
|
that all of the file data is written to the device. For all pending I/O
|
|
operations, the parts guaranteeing data integrity finished.
|
|
|
|
Not all systems implement the @code{fdatasync} operation. On systems
|
|
missing this functionality @code{fdatasync} is emulated by a call to
|
|
@code{fsync} since the performed actions are a superset of those
|
|
required by @code{fdatasyn}.
|
|
|
|
The prototype for @code{fdatasync} is in @file{unistd.h}.
|
|
|
|
The return value of the function is zero if no error occurred. Otherwise
|
|
it is @math{-1} and the global variable @var{errno} is set to the
|
|
following values:
|
|
@table @code
|
|
@item EBADF
|
|
The descriptor @var{fildes} is not valid.
|
|
|
|
@item EINVAL
|
|
No synchronization is possible since the system does not implement this.
|
|
@end table
|
|
@end deftypefun
|
|
|
|
|
|
@node Asynchronous I/O
|
|
@section Perform I/O Operations in Parallel
|
|
|
|
The POSIX.1b standard defines a new set of I/O operations which can
|
|
significantly reduce the time an application spends waiting at I/O. The
|
|
new functions allow a program to initiate one or more I/O operations and
|
|
then immediately resume normal work while the I/O operations are
|
|
executed in parallel. This functionality is available if the
|
|
@file{unistd.h} file defines the symbol @code{_POSIX_ASYNCHRONOUS_IO}.
|
|
|
|
These functions are part of the library with realtime functions named
|
|
@file{librt}. They are not actually part of the @file{libc} binary.
|
|
The implementation of these functions can be done using support in the
|
|
kernel (if available) or using an implementation based on threads at
|
|
userlevel. In the latter case it might be necessary to link applications
|
|
with the thread library @file{libpthread} in addition to @file{librt}.
|
|
|
|
All AIO operations operate on files which were opened previously. There
|
|
might be arbitrarily many operations running for one file. The
|
|
asynchronous I/O operations are controlled using a data structure named
|
|
@code{struct aiocb} (@dfn{AIO control block}). It is defined in
|
|
@file{aio.h} as follows.
|
|
|
|
@comment aio.h
|
|
@comment POSIX.1b
|
|
@deftp {Data Type} {struct aiocb}
|
|
The POSIX.1b standard mandates that the @code{struct aiocb} structure
|
|
contains at least the members described in the following table. There
|
|
might be more elements which are used by the implementation, but
|
|
depending on these elements is not portable and is highly deprecated.
|
|
|
|
@table @code
|
|
@item int aio_fildes
|
|
This element specifies the file descriptor which is used for the
|
|
operation. It must be a legal descriptor since otherwise the operation
|
|
fails.
|
|
|
|
The device on which the file is opened must allow the seek operation.
|
|
I.e., it is not possible to use any of the AIO operations on devices
|
|
like terminals where an @code{lseek} call would lead to an error.
|
|
|
|
@item off_t aio_offset
|
|
This element specifies at which offset in the file the operation (input
|
|
or output) is performed. Since the operations are carried out in arbitrary
|
|
order and more than one operation for one file descriptor can be
|
|
started, one cannot expect a current read/write position of the file
|
|
descriptor.
|
|
|
|
@item volatile void *aio_buf
|
|
This is a pointer to the buffer with the data to be written or the place
|
|
where the read data is stored.
|
|
|
|
@item size_t aio_nbytes
|
|
This element specifies the length of the buffer pointed to by @code{aio_buf}.
|
|
|
|
@item int aio_reqprio
|
|
If the platform has defined @code{_POSIX_PRIORITIZED_IO} and
|
|
@code{_POSIX_PRIORITY_SCHEDULING} the AIO requests are
|
|
processed based on the current scheduling priority. The
|
|
@code{aio_reqprio} element can then be used to lower the priority of the
|
|
AIO operation.
|
|
|
|
@item struct sigevent aio_sigevent
|
|
This element specifies how the calling process is notified once the
|
|
operation terminates. If the @code{sigev_notify} element is
|
|
@code{SIGEV_NONE} no notification is send. If it is @code{SIGEV_SIGNAL}
|
|
the signal determined by @code{sigev_signo} is send. Otherwise
|
|
@code{sigev_notify} must be @code{SIGEV_THREAD}. In this case a thread
|
|
is created which starts executing the function pointed to by
|
|
@code{sigev_notify_function}.
|
|
|
|
@item int aio_lio_opcode
|
|
This element is only used by the @code{lio_listio} and
|
|
@code{lio_listio64} functions. Since these functions allow an
|
|
arbitrary number of operations to start at once, and each operation can be
|
|
input or output (or nothing), the information must be stored in the
|
|
control block. The possible values are:
|
|
|
|
@vtable @code
|
|
@item LIO_READ
|
|
Start a read operation. Read from the file at position
|
|
@code{aio_offset} and store the next @code{aio_nbytes} bytes in the
|
|
buffer pointed to by @code{aio_buf}.
|
|
|
|
@item LIO_WRITE
|
|
Start a write operation. Write @code{aio_nbytes} bytes starting at
|
|
@code{aio_buf} into the file starting at position @code{aio_offset}.
|
|
|
|
@item LIO_NOP
|
|
Do nothing for this control block. This value is useful sometimes when
|
|
an array of @code{struct aiocb} values contains holes, i.e., some of the
|
|
values must not be handled although the whole array is presented to the
|
|
@code{lio_listio} function.
|
|
@end vtable
|
|
@end table
|
|
|
|
When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a
|
|
32 bit machine this type is in fact @code{struct aiocb64} since the LFS
|
|
interface transparently replaces the @code{struct aiocb} definition.
|
|
@end deftp
|
|
|
|
For use with the AIO functions defined in the LFS there is a similar type
|
|
defined which replaces the types of the appropriate members with larger
|
|
types but otherwise is equivalent to @code{struct aiocb}. Particularly,
|
|
all member names are the same.
|
|
|
|
@comment aio.h
|
|
@comment POSIX.1b
|
|
@deftp {Data Type} {struct aiocb64}
|
|
@table @code
|
|
@item int aio_fildes
|
|
This element specifies the file descriptor which is used for the
|
|
operation. It must be a legal descriptor since otherwise the operation
|
|
fails for obvious reasons.
|
|
|
|
The device on which the file is opened must allow the seek operation.
|
|
I.e., it is not possible to use any of the AIO operations on devices
|
|
like terminals where an @code{lseek} call would lead to an error.
|
|
|
|
@item off64_t aio_offset
|
|
This element specifies at which offset in the file the operation (input
|
|
or output) is performed. Since the operation are carried in arbitrary
|
|
order and more than one operation for one file descriptor can be
|
|
started, one cannot expect a current read/write position of the file
|
|
descriptor.
|
|
|
|
@item volatile void *aio_buf
|
|
This is a pointer to the buffer with the data to be written or the place
|
|
where the ead data is stored.
|
|
|
|
@item size_t aio_nbytes
|
|
This element specifies the length of the buffer pointed to by @code{aio_buf}.
|
|
|
|
@item int aio_reqprio
|
|
If for the platform @code{_POSIX_PRIORITIZED_IO} and
|
|
@code{_POSIX_PRIORITY_SCHEDULING} are defined the AIO requests are
|
|
processed based on the current scheduling priority. The
|
|
@code{aio_reqprio} element can then be used to lower the priority of the
|
|
AIO operation.
|
|
|
|
@item struct sigevent aio_sigevent
|
|
This element specifies how the calling process is notified once the
|
|
operation terminates. If the @code{sigev_notify} element is
|
|
@code{SIGEV_NONE} no notification is sent. If it is @code{SIGEV_SIGNAL}
|
|
the signal determined by @code{sigev_signo} is sent. Otherwise
|
|
@code{sigev_notify} must be @code{SIGEV_THREAD} in which case a thread
|
|
which starts executing the function pointed to by
|
|
@code{sigev_notify_function}.
|
|
|
|
@item int aio_lio_opcode
|
|
This element is only used by the @code{lio_listio} and
|
|
@code{[lio_listio64} functions. Since these functions allow an
|
|
arbitrary number of operations to start at once, and since each operation can be
|
|
input or output (or nothing), the information must be stored in the
|
|
control block. See the description of @code{struct aiocb} for a description
|
|
of the possible values.
|
|
@end table
|
|
|
|
When the sources are compiled using @code{_FILE_OFFSET_BITS == 64} on a
|
|
32 bit machine this type is available under the name @code{struct
|
|
aiocb64} since the LFS replaces transparently the old interface.
|
|
@end deftp
|
|
|
|
@menu
|
|
* Asynchronous Reads/Writes:: Asynchronous Read and Write Operations.
|
|
* Status of AIO Operations:: Getting the Status of AIO Operations.
|
|
* Synchronizing AIO Operations:: Getting into a consistent state.
|
|
* Cancel AIO Operations:: Cancellation of AIO Operations.
|
|
* Configuration of AIO:: How to optimize the AIO implementation.
|
|
@end menu
|
|
|
|
@node Asynchronous Reads/Writes
|
|
@subsection Asynchronous Read and Write Operations
|
|
|
|
@comment aio.h
|
|
@comment POSIX.1b
|
|
@deftypefun int aio_read (struct aiocb *@var{aiocbp})
|
|
This function initiates an asynchronous read operation. It
|
|
immediately returns after the operation was enqueued or when an
|
|
error was encountered.
|
|
|
|
The first @code{aiocbp->aio_nbytes} bytes of the file for which
|
|
@code{aiocbp->aio_fildes} is a descriptor are written to the buffer
|
|
starting at @code{aiocbp->aio_buf}. Reading starts at the absolute
|
|
position @code{aiocbp->aio_offset} in the file.
|
|
|
|
If prioritized I/O is supported by the platform the
|
|
@code{aiocbp->aio_reqprio} value is used to adjust the priority before
|
|
the request is actually enqueued.
|
|
|
|
The calling process is notified about the termination of the read
|
|
request according to the @code{aiocbp->aio_sigevent} value.
|
|
|
|
When @code{aio_read} returns, the return value is zero if no error
|
|
occurred that can be found before the process is enqueued. If such an
|
|
early error is found, the function returns @math{-1} and sets
|
|
@code{errno} to one of the following values:
|
|
|
|
@table @code
|
|
@item EAGAIN
|
|
The request was not enqueued due to (temporarily) exceeded resource
|
|
limitations.
|
|
@item ENOSYS
|
|
The @code{aio_read} function is not implemented.
|
|
@item EBADF
|
|
The @code{aiocbp->aio_fildes} descriptor is not valid. This condition
|
|
need not be recognized before enqueueing the request and so this error
|
|
might also be signaled asynchronously.
|
|
@item EINVAL
|
|
The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqpiro} value is
|
|
invalid. This condition need not be recognized before enqueueing the
|
|
request and so this error might also be signaled asynchronously.
|
|
@end table
|
|
|
|
If @code{aio_read} returns zero, the current status of the request
|
|
can be queried using @code{aio_error} and @code{aio_return} functions.
|
|
As long as the value returned by @code{aio_error} is @code{EINPROGRESS}
|
|
the operation has not yet completed. If @code{aio_error} returns zero,
|
|
the operation successfully terminated, otherwise the value is to be
|
|
interpreted as an error code. If the function terminated, the result of
|
|
the operation can be obtained using a call to @code{aio_return}. The
|
|
returned value is the same as an equivalent call to @code{read} would
|
|
have returned. Possible error codes returned by @code{aio_error} are:
|
|
|
|
@table @code
|
|
@item EBADF
|
|
The @code{aiocbp->aio_fildes} descriptor is not valid.
|
|
@item ECANCELED
|
|
The operation was cancelled before the operation was finished
|
|
(@pxref{Cancel AIO Operations})
|
|
@item EINVAL
|
|
The @code{aiocbp->aio_offset} value is invalid.
|
|
@end table
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is in fact @code{aio_read64} since the LFS interface transparently
|
|
replaces the normal implementation.
|
|
@end deftypefun
|
|
|
|
@comment aio.h
|
|
@comment Unix98
|
|
@deftypefun int aio_read64 (struct aiocb *@var{aiocbp})
|
|
This function is similar to the @code{aio_read} function. The only
|
|
difference is that on @w{32 bit} machines the file descriptor should
|
|
be opened in the large file mode. Internally @code{aio_read64} uses
|
|
functionality equivalent to @code{lseek64} (@pxref{File Position
|
|
Primitive}) to position the file descriptor correctly for the reading,
|
|
as opposed to @code{lseek} functionality used in @code{aio_read}.
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is available under the name @code{aio_read} and so transparently
|
|
replaces the interface for small files on 32 bit machines.
|
|
@end deftypefun
|
|
|
|
To write data asynchronously to a file there exists an equivalent pair
|
|
of functions with a very similar interface.
|
|
|
|
@comment aio.h
|
|
@comment POSIX.1b
|
|
@deftypefun int aio_write (struct aiocb *@var{aiocbp})
|
|
This function initiates an asynchronous write operation. The function
|
|
call immediately returns after the operation was enqueued or if before
|
|
this happens an error was encountered.
|
|
|
|
The first @code{aiocbp->aio_nbytes} bytes from the buffer starting at
|
|
@code{aiocbp->aio_buf} are written to the file for which
|
|
@code{aiocbp->aio_fildes} is an descriptor, starting at the absolute
|
|
position @code{aiocbp->aio_offset} in the file.
|
|
|
|
If prioritized I/O is supported by the platform the
|
|
@code{aiocbp->aio_reqprio} value is used to adjust the priority before
|
|
the request is actually enqueued.
|
|
|
|
The calling process is notified about the termination of the read
|
|
request according to the @code{aiocbp->aio_sigevent} value.
|
|
|
|
When @code{aio_write} returns the return value is zero if no error
|
|
occurred that can be found before the process is enqueued. If such an
|
|
early error is found the function returns @math{-1} and sets
|
|
@code{errno} to one of the following values.
|
|
|
|
@table @code
|
|
@item EAGAIN
|
|
The request was not enqueued due to (temporarily) exceeded resource
|
|
limitations.
|
|
@item ENOSYS
|
|
The @code{aio_write} function is not implemented.
|
|
@item EBADF
|
|
The @code{aiocbp->aio_fildes} descriptor is not valid. This condition
|
|
needs not be recognized before enqueueing the request and so this error
|
|
might also be signaled asynchronously.
|
|
@item EINVAL
|
|
The @code{aiocbp->aio_offset} or @code{aiocbp->aio_reqpiro} value is
|
|
invalid. This condition needs not be recognized before enqueueing the
|
|
request and so this error might also be signaled asynchronously.
|
|
@end table
|
|
|
|
In the case @code{aio_write} returns zero the current status of the
|
|
request can be queried using @code{aio_error} and @code{aio_return}
|
|
functions. As long as the value returned by @code{aio_error} is
|
|
@code{EINPROGRESS} the operation has not yet completed. If
|
|
@code{aio_error} returns zero the operation successfully terminated,
|
|
otherwise the value is to be interpreted as an error code. If the
|
|
function terminated the result of the operation can be get using a call
|
|
to @code{aio_return}. The returned value is the same as an equivalent
|
|
call to @code{read} would have returned. Possible error code returned
|
|
by @code{aio_error} are:
|
|
|
|
@table @code
|
|
@item EBADF
|
|
The @code{aiocbp->aio_fildes} descriptor is not valid.
|
|
@item ECANCELED
|
|
The operation was cancelled before the operation was finished
|
|
(@pxref{Cancel AIO Operations})
|
|
@item EINVAL
|
|
The @code{aiocbp->aio_offset} value is invalid.
|
|
@end table
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is in fact @code{aio_write64} since the LFS interface transparently
|
|
replaces the normal implementation.
|
|
@end deftypefun
|
|
|
|
@comment aio.h
|
|
@comment Unix98
|
|
@deftypefun int aio_write64 (struct aiocb *@var{aiocbp})
|
|
This function is similar to the @code{aio_write} function. The only
|
|
difference is that on @w{32 bit} machines the file descriptor should
|
|
be opened in the large file mode. Internally @code{aio_write64} uses
|
|
functionality equivalent to @code{lseek64} (@pxref{File Position
|
|
Primitive}) to position the file descriptor correctly for the writing,
|
|
as opposed to @code{lseek} functionality used in @code{aio_write}.
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is available under the name @code{aio_write} and so transparently
|
|
replaces the interface for small files on 32 bit machines.
|
|
@end deftypefun
|
|
|
|
Beside these functions with the more or less traditional interface
|
|
POSIX.1b also defines a function with can initiate more than one
|
|
operation at once and which can handled freely mixed read and write
|
|
operation. It is therefore similar to a combination of @code{readv} and
|
|
@code{writev}.
|
|
|
|
@comment aio.h
|
|
@comment POSIX.1b
|
|
@deftypefun int lio_listio (int @var{mode}, struct aiocb *const @var{list}[], int @var{nent}, struct sigevent *@var{sig})
|
|
The @code{lio_listio} function can be used to enqueue an arbitrary
|
|
number of read and write requests at one time. The requests can all be
|
|
meant for the same file, all for different files or every solution in
|
|
between.
|
|
|
|
@code{lio_listio} gets the @var{nent} requests from the array pointed to
|
|
by @var{list}. What operation has to be performed is determined by the
|
|
@code{aio_lio_opcode} member in each element of @var{list}. If this
|
|
field is @code{LIO_READ} an read operation is queued, similar to a call
|
|
of @code{aio_read} for this element of the array (except that the way
|
|
the termination is signalled is different, as we will see below). If
|
|
the @code{aio_lio_opcode} member is @code{LIO_WRITE} an write operation
|
|
is enqueued. Otherwise the @code{aio_lio_opcode} must be @code{LIO_NOP}
|
|
in which case this element of @var{list} is simply ignored. This
|
|
``operation'' is useful in situations where one has a fixed array of
|
|
@code{struct aiocb} elements from which only a few need to be handled at
|
|
a time. Another situation is where the @code{lio_listio} call was
|
|
cancelled before all requests are processed (@pxref{Cancel AIO
|
|
Operations}) and the remaining requests have to be reissued.
|
|
|
|
The other members of each element of the array pointed to by
|
|
@code{list} must have values suitable for the operation as described in
|
|
the documentation for @code{aio_read} and @code{aio_write} above.
|
|
|
|
The @var{mode} argument determines how @code{lio_listio} behaves after
|
|
having enqueued all the requests. If @var{mode} is @code{LIO_WAIT} it
|
|
waits until all requests terminated. Otherwise @var{mode} must be
|
|
@code{LIO_NOWAIT} and in this case the function returns immediately after
|
|
having enqueued all the requests. In this case the caller gets a
|
|
notification of the termination of all requests according to the
|
|
@var{sig} parameter. If @var{sig} is @code{NULL} no notification is
|
|
send. Otherwise a signal is sent or a thread is started, just as
|
|
described in the description for @code{aio_read} or @code{aio_write}.
|
|
|
|
If @var{mode} is @code{LIO_WAIT} the return value of @code{lio_listio}
|
|
is @math{0} when all requests completed successfully. Otherwise the
|
|
function return @math{-1} and @code{errno} is set accordingly. To find
|
|
out which request or requests failed one has to use the @code{aio_error}
|
|
function on all the elements of the array @var{list}.
|
|
|
|
In case @var{mode} is @code{LIO_NOWAIT} the function return @math{0} if
|
|
all requests were enqueued correctly. The current state of the requests
|
|
can be found using @code{aio_error} and @code{aio_return} as described
|
|
above. In case @code{lio_listio} returns @math{-1} in this mode the
|
|
global variable @code{errno} is set accordingly. If a request did not
|
|
yet terminate a call to @code{aio_error} returns @code{EINPROGRESS}. If
|
|
the value is different the request is finished and the error value (or
|
|
@math{0}) is returned and the result of the operation can be retrieved
|
|
using @code{aio_return}.
|
|
|
|
Possible values for @code{errno} are:
|
|
|
|
@table @code
|
|
@item EAGAIN
|
|
The resources necessary to queue all the requests are not available in
|
|
the moment. The error status for each element of @var{list} must be
|
|
checked which request failed.
|
|
|
|
Another reason could be that the system wide limit of AIO requests is
|
|
exceeded. This cannot be the case for the implementation on GNU systems
|
|
since no arbitrary limits exist.
|
|
@item EINVAL
|
|
The @var{mode} parameter is invalid or @var{nent} is larger than
|
|
@code{AIO_LISTIO_MAX}.
|
|
@item EIO
|
|
One or more of the request's I/O operations failed. The error status of
|
|
each request should be checked for which one failed.
|
|
@item ENOSYS
|
|
The @code{lio_listio} function is not supported.
|
|
@end table
|
|
|
|
If the @var{mode} parameter is @code{LIO_NOWAIT} and the caller cancels
|
|
an request the error status for this request returned by
|
|
@code{aio_error} is @code{ECANCELED}.
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is in fact @code{lio_listio64} since the LFS interface
|
|
transparently replaces the normal implementation.
|
|
@end deftypefun
|
|
|
|
@comment aio.h
|
|
@comment Unix98
|
|
@deftypefun int lio_listio64 (int @var{mode}, struct aiocb *const @var{list}, int @var{nent}, struct sigevent *@var{sig})
|
|
This function is similar to the @code{aio_listio} function. The only
|
|
difference is that only @w{32 bit} machines the file descriptor should
|
|
be opened in the large file mode. Internally @code{lio_listio64} uses
|
|
functionality equivalent to @code{lseek64} (@pxref{File Position
|
|
Primitive}) to position the file descriptor correctly for the reading or
|
|
writing, as opposed to @code{lseek} functionality used in
|
|
@code{lio_listio}.
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is available under the name @code{lio_listio} and so
|
|
transparently replaces the interface for small files on 32 bit
|
|
machines.
|
|
@end deftypefun
|
|
|
|
@node Status of AIO Operations
|
|
@subsection Getting the Status of AIO Operations
|
|
|
|
As already described in the documentation of the functions in the last
|
|
section, it must be possible to get information about the status of an I/O
|
|
request. When the operation is performed truly asynchronously (as with
|
|
@code{aio_read} and @code{aio_write} and with @code{aio_listio} when the
|
|
mode is @code{LIO_NOWAIT}) one sometimes needs to know whether a
|
|
specific request already terminated and if yes, what the result was.
|
|
The following two functions allow you to get this kind of information.
|
|
|
|
@comment aio.h
|
|
@comment POSIX.1b
|
|
@deftypefun int aio_error (const struct aiocb *@var{aiocbp})
|
|
This function determines the error state of the request described by the
|
|
@code{struct aiocb} variable pointed to by @var{aiocbp}. If the
|
|
request has not yet terminated the value returned is always
|
|
@code{EINPROGRESS}. Once the request has terminated the value
|
|
@code{aio_error} returns is either @math{0} if the request completed
|
|
successfully or it returns the value which would be stored in the
|
|
@code{errno} variable if the request would have been done using
|
|
@code{read}, @code{write}, or @code{fsync}.
|
|
|
|
The function can return @code{ENOSYS} if it is not implemented. It
|
|
could also return @code{EINVAL} if the @var{aiocbp} parameter does not
|
|
refer to an asynchronous operation whose return status is not yet known.
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is in fact @code{aio_error64} since the LFS interface
|
|
transparently replaces the normal implementation.
|
|
@end deftypefun
|
|
|
|
@comment aio.h
|
|
@comment Unix98
|
|
@deftypefun int aio_error64 (const struct aiocb64 *@var{aiocbp})
|
|
This function is similar to @code{aio_error} with the only difference
|
|
that the argument is a reference to a variable of type @code{struct
|
|
aiocb64}.
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is available under the name @code{aio_error} and so
|
|
transparently replaces the interface for small files on 32 bit
|
|
machines.
|
|
@end deftypefun
|
|
|
|
@comment aio.h
|
|
@comment POSIX.1b
|
|
@deftypefun ssize_t aio_return (const struct aiocb *@var{aiocbp})
|
|
This function can be used to retrieve the return status of the operation
|
|
carried out by the request described in the variable pointed to by
|
|
@var{aiocbp}. As long as the error status of this request as returned
|
|
by @code{aio_error} is @code{EINPROGRESS} the return of this function is
|
|
undefined.
|
|
|
|
Once the request is finished this function can be used exactly once to
|
|
retrieve the return value. Following calls might lead to undefined
|
|
behaviour. The return value itself is the value which would have been
|
|
returned by the @code{read}, @code{write}, or @code{fsync} call.
|
|
|
|
The function can return @code{ENOSYS} if it is not implemented. It
|
|
could also return @code{EINVAL} if the @var{aiocbp} parameter does not
|
|
refer to an asynchronous operation whose return status is not yet known.
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is in fact @code{aio_return64} since the LFS interface
|
|
transparently replaces the normal implementation.
|
|
@end deftypefun
|
|
|
|
@comment aio.h
|
|
@comment Unix98
|
|
@deftypefun int aio_return64 (const struct aiocb64 *@var{aiocbp})
|
|
This function is similar to @code{aio_return} with the only difference
|
|
that the argument is a reference to a variable of type @code{struct
|
|
aiocb64}.
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is available under the name @code{aio_return} and so
|
|
transparently replaces the interface for small files on 32 bit
|
|
machines.
|
|
@end deftypefun
|
|
|
|
@node Synchronizing AIO Operations
|
|
@subsection Getting into a Consistent State
|
|
|
|
When dealing with asynchronous operations it is sometimes necessary to
|
|
get into a consistent state. This would mean for AIO that one wants to
|
|
know whether a certain request or a group of request were processed.
|
|
This could be done by waiting for the notification sent by the system
|
|
after the operation terminated, but this sometimes would mean wasting
|
|
resources (mainly computation time). Instead POSIX.1b defines two
|
|
functions which will help with most kinds of consistency.
|
|
|
|
The @code{aio_fsync} and @code{aio_fsync64} functions are only available
|
|
if in @file{unistd.h} the symbol @code{_POSIX_SYNCHRONIZED_IO} is
|
|
defined.
|
|
|
|
@cindex synchronizing
|
|
@comment aio.h
|
|
@comment POSIX.1b
|
|
@deftypefun int aio_fsync (int @var{op}, struct aiocb *@var{aiocbp})
|
|
Calling this function forces all I/O operations operating queued at the
|
|
time of the function call operating on the file descriptor
|
|
@code{aiocbp->aio_fildes} into the synchronized I/O completion state
|
|
(@pxref{Synchronizing I/O}). The @code{aio_fsync} function returns
|
|
immediately but the notification through the method described in
|
|
@code{aiocbp->aio_sigevent} will happen only after all requests for this
|
|
file descriptor have terminated and the file is synchronized. This also
|
|
means that requests for this very same file descriptor which are queued
|
|
after the synchronization request are not affected.
|
|
|
|
If @var{op} is @code{O_DSYNC} the synchronization happens as with a call
|
|
to @code{fdatasync}. Otherwise @var{op} should be @code{O_SYNC} and
|
|
the synchronization happens as with @code{fsync}.
|
|
|
|
As long as the synchronization has not happened a call to
|
|
@code{aio_error} with the reference to the object pointed to by
|
|
@var{aiocbp} returns @code{EINPROGRESS}. Once the synchronization is
|
|
done @code{aio_error} return @math{0} if the synchronization was not
|
|
successful. Otherwise the value returned is the value to which the
|
|
@code{fsync} or @code{fdatasync} function would have set the
|
|
@code{errno} variable. In this case nothing can be assumed about the
|
|
consistency for the data written to this file descriptor.
|
|
|
|
The return value of this function is @math{0} if the request was
|
|
successfully filed. Otherwise the return value is @math{-1} and
|
|
@code{errno} is set to one of the following values:
|
|
|
|
@table @code
|
|
@item EAGAIN
|
|
The request could not be enqueued due to temporary lack of resources.
|
|
@item EBADF
|
|
The file descriptor @code{aiocbp->aio_fildes} is not valid or not open
|
|
for writing.
|
|
@item EINVAL
|
|
The implementation does not support I/O synchronization or the @var{op}
|
|
parameter is other than @code{O_DSYNC} and @code{O_SYNC}.
|
|
@item ENOSYS
|
|
This function is not implemented.
|
|
@end table
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is in fact @code{aio_return64} since the LFS interface
|
|
transparently replaces the normal implementation.
|
|
@end deftypefun
|
|
|
|
@comment aio.h
|
|
@comment Unix98
|
|
@deftypefun int aio_fsync64 (int @var{op}, struct aiocb64 *@var{aiocbp})
|
|
This function is similar to @code{aio_fsync} with the only difference
|
|
that the argument is a reference to a variable of type @code{struct
|
|
aiocb64}.
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is available under the name @code{aio_fsync} and so
|
|
transparently replaces the interface for small files on 32 bit
|
|
machines.
|
|
@end deftypefun
|
|
|
|
Another method of synchronization is to wait until one or more requests of a
|
|
specific set terminated. This could be achieved by the @code{aio_*}
|
|
functions to notify the initiating process about the termination but in
|
|
some situations this is not the ideal solution. In a program which
|
|
constantly updates clients somehow connected to the server it is not
|
|
always the best solution to go round robin since some connections might
|
|
be slow. On the other hand letting the @code{aio_*} function notify the
|
|
caller might also be not the best solution since whenever the process
|
|
works on preparing data for on client it makes no sense to be
|
|
interrupted by a notification since the new client will not be handled
|
|
before the current client is served. For situations like this
|
|
@code{aio_suspend} should be used.
|
|
|
|
@comment aio.h
|
|
@comment POSIX.1b
|
|
@deftypefun int aio_suspend (const struct aiocb *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout})
|
|
When calling this function the calling thread is suspended until at
|
|
least one of the requests pointed to by the @var{nent} elements of the
|
|
array @var{list} has completed. If any of the requests already has
|
|
completed at the time @code{aio_suspend} is called the function returns
|
|
immediately. Whether a request has terminated or not is done by
|
|
comparing the error status of the request with @code{EINPROGRESS}. If
|
|
an element of @var{list} is @code{NULL} the entry is simply ignored.
|
|
|
|
If no request has finished the calling process is suspended. If
|
|
@var{timeout} is @code{NULL} the process is not waked until a request
|
|
finished. If @var{timeout} is not @code{NULL} the process remains
|
|
suspended at as long as specified in @var{timeout}. In this case
|
|
@code{aio_suspend} returns with an error.
|
|
|
|
The return value of the function is @math{0} if one or more requests
|
|
from the @var{list} have terminated. Otherwise the function returns
|
|
@math{-1} and @code{errno} is set to one of the following values:
|
|
|
|
@table @code
|
|
@item EAGAIN
|
|
None of the requests from the @var{list} completed in the time specified
|
|
by @var{timeout}.
|
|
@item EINTR
|
|
A signal interrupted the @code{aio_suspend} function. This signal might
|
|
also be sent by the AIO implementation while signalling the termination
|
|
of one of the requests.
|
|
@item ENOSYS
|
|
The @code{aio_suspend} function is not implemented.
|
|
@end table
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is in fact @code{aio_suspend64} since the LFS interface
|
|
transparently replaces the normal implementation.
|
|
@end deftypefun
|
|
|
|
@comment aio.h
|
|
@comment Unix98
|
|
@deftypefun int aio_suspend64 (const struct aiocb64 *const @var{list}[], int @var{nent}, const struct timespec *@var{timeout})
|
|
This function is similar to @code{aio_suspend} with the only difference
|
|
that the argument is a reference to a variable of type @code{struct
|
|
aiocb64}.
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is available under the name @code{aio_suspend} and so
|
|
transparently replaces the interface for small files on 32 bit
|
|
machines.
|
|
@end deftypefun
|
|
|
|
@node Cancel AIO Operations
|
|
@subsection Cancellation of AIO Operations
|
|
|
|
When one or more requests are asynchronously processed it might be
|
|
useful in some situations to cancel a selected operation, e.g., if it
|
|
becomes obvious that the written data is not anymore accurate and would
|
|
have to be overwritten soon. As an example assume an application, which
|
|
writes data in files in a situation where new incoming data would have
|
|
to be written in a file which will be updated by an enqueued request.
|
|
The POSIX AIO implementation provides such a function but this function
|
|
is not capable to force the cancellation of the request. It is up to the
|
|
implementation to decide whether it is possible to cancel the operation
|
|
or not. Therefore using this function is merely a hint.
|
|
|
|
@comment aio.h
|
|
@comment POSIX.1b
|
|
@deftypefun int aio_cancel (int @var{fildes}, struct aiocb *@var{aiocbp})
|
|
The @code{aio_cancel} function can be used to cancel one or more
|
|
outstanding requests. If the @var{aiocbp} parameter is @code{NULL} the
|
|
function tries to cancel all outstanding requests which would process
|
|
the file descriptor @var{fildes} (i.e.,, whose @code{aio_fildes} member
|
|
is @var{fildes}). If @var{aiocbp} is not @code{NULL} the very specific
|
|
request pointed to by @var{aiocbp} is tried to be cancelled.
|
|
|
|
For requests which were successfully cancelled the normal notification
|
|
about the termination of the request should take place. I.e., depending
|
|
on the @code{struct sigevent} object which controls this, nothing
|
|
happens, a signal is sent or a thread is started. If the request cannot
|
|
be cancelled it terminates the usual way after performing te operation.
|
|
|
|
After a request is successfully cancelled a call to @code{aio_error} with
|
|
a reference to this request as the parameter will return
|
|
@code{ECANCELED} and a call to @code{aio_return} will return @math{-1}.
|
|
If the request wasn't cancelled and is still running the error status is
|
|
still @code{EINPROGRESS}.
|
|
|
|
The return value of the function is @code{AIO_CANCELED} if there were
|
|
requests which haven't terminated and which successfully were cancelled.
|
|
If there is one or more request left which couldn't be cancelled the
|
|
return value is @code{AIO_NOTCANCELED}. In this case @code{aio_error}
|
|
must be used to find out which of the perhaps multiple requests (in
|
|
@var{aiocbp} is @code{NULL}) wasn't successfully cancelled. If all
|
|
requests already terminated at the time @code{aio_cancel} is called the
|
|
return value is @code{AIO_ALLDONE}.
|
|
|
|
If an error occurred during the execution of @code{aio_cancel} the
|
|
function returns @math{-1} and sets @code{errno} to one of the following
|
|
values.
|
|
|
|
@table @code
|
|
@item EBADF
|
|
The file descriptor @var{fildes} is not valid.
|
|
@item ENOSYS
|
|
@code{aio_cancel} is not implemented.
|
|
@end table
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is in fact @code{aio_cancel64} since the LFS interface
|
|
transparently replaces the normal implementation.
|
|
@end deftypefun
|
|
|
|
@comment aio.h
|
|
@comment Unix98
|
|
@deftypefun int aio_cancel64 (int @var{fildes}, struct aiocb *@var{aiocbp})
|
|
This function is similar to @code{aio_cancel} with the only difference
|
|
that the argument is a reference to a variable of type @code{struct
|
|
aiocb64}.
|
|
|
|
When the sources are compiled with @code{_FILE_OFFSET_BITS == 64} this
|
|
function is available under the name @code{aio_cancel} and so
|
|
transparently replaces the interface for small files on 32 bit
|
|
machines.
|
|
@end deftypefun
|
|
|
|
@node Configuration of AIO
|
|
@subsection How to optimize the AIO implementation
|
|
|
|
The POSIX standard does not specify how the AIO functions are
|
|
implemented. They could be system calls but it is also possible to
|
|
emulate them at userlevel.
|
|
|
|
At least the available implementation at the point of this writing is a
|
|
userlevel implementation which uses threads for handling the enqueued
|
|
requests. This implementation requires to make some decisions about
|
|
limitations but hard limitations are something which better should be
|
|
avoided the GNU C library implementation provides a mean to tune the AIO
|
|
implementation individually for each use.
|
|
|
|
@comment aio.h
|
|
@comment GNU
|
|
@deftp {Data Type} {struct aioinit}
|
|
This data type is used to pass the configuration or tunable parameters
|
|
to the implementation. The program has to initialize the members of
|
|
this struct and pass it to the implementation using the @code{aio_init}
|
|
function.
|
|
|
|
@table @code
|
|
@item int aio_threads
|
|
This member specifies the maximal number of threads which must be used
|
|
at any one time.
|
|
@item int aio_num
|
|
This number provides an estimate on the maximal number of simultaneously
|
|
enqueued requests.
|
|
@item int aio_locks
|
|
@c What?
|
|
@item int aio_usedba
|
|
@c What?
|
|
@item int aio_debug
|
|
@c What?
|
|
@item int aio_numusers
|
|
@c What?
|
|
@item int aio_reserved[2]
|
|
@c What?
|
|
@end table
|
|
@end deftp
|
|
|
|
@comment aio.h
|
|
@comment GNU
|
|
@deftypefun void aio_init (const struct aioinit *@var{init})
|
|
This function must be called before any other AIO function. Calling it
|
|
is completely voluntarily since it only is meant to help the AIO
|
|
implementation to perform better.
|
|
|
|
Before calling the @code{aio_init} function the members of a variable of
|
|
type @code{struct aioinit} must be initialized. Then a reference to
|
|
this variable is passed as the parameter to @code{aio_init} which itself
|
|
may or may not pay attention to the hints.
|
|
|
|
The function has no return value and no error cases are defined. It is
|
|
a extension which follows a proposal from the SGI implementation in
|
|
@w{Irix 6}. It is not covered by POSIX.1b or Unix98.
|
|
@end deftypefun
|
|
|
|
@node Control Operations
|
|
@section Control Operations on Files
|
|
|
|
@cindex control operations on files
|
|
@cindex @code{fcntl} function
|
|
This section describes how you can perform various other operations on
|
|
file descriptors, such as inquiring about or setting flags describing
|
|
the status of the file descriptor, manipulating record locks, and the
|
|
like. All of these operations are performed by the function @code{fcntl}.
|
|
|
|
The second argument to the @code{fcntl} function is a command that
|
|
specifies which operation to perform. The function and macros that name
|
|
various flags that are used with it are declared in the header file
|
|
@file{fcntl.h}. Many of these flags are also used by the @code{open}
|
|
function; see @ref{Opening and Closing Files}.
|
|
@pindex fcntl.h
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypefun int fcntl (int @var{filedes}, int @var{command}, @dots{})
|
|
The @code{fcntl} function performs the operation specified by
|
|
@var{command} on the file descriptor @var{filedes}. Some commands
|
|
require additional arguments to be supplied. These additional arguments
|
|
and the return value and error conditions are given in the detailed
|
|
descriptions of the individual commands.
|
|
|
|
Briefly, here is a list of what the various commands are.
|
|
|
|
@table @code
|
|
@item F_DUPFD
|
|
Duplicate the file descriptor (return another file descriptor pointing
|
|
to the same open file). @xref{Duplicating Descriptors}.
|
|
|
|
@item F_GETFD
|
|
Get flags associated with the file descriptor. @xref{Descriptor Flags}.
|
|
|
|
@item F_SETFD
|
|
Set flags associated with the file descriptor. @xref{Descriptor Flags}.
|
|
|
|
@item F_GETFL
|
|
Get flags associated with the open file. @xref{File Status Flags}.
|
|
|
|
@item F_SETFL
|
|
Set flags associated with the open file. @xref{File Status Flags}.
|
|
|
|
@item F_GETLK
|
|
Get a file lock. @xref{File Locks}.
|
|
|
|
@item F_SETLK
|
|
Set or clear a file lock. @xref{File Locks}.
|
|
|
|
@item F_SETLKW
|
|
Like @code{F_SETLK}, but wait for completion. @xref{File Locks}.
|
|
|
|
@item F_GETOWN
|
|
Get process or process group ID to receive @code{SIGIO} signals.
|
|
@xref{Interrupt Input}.
|
|
|
|
@item F_SETOWN
|
|
Set process or process group ID to receive @code{SIGIO} signals.
|
|
@xref{Interrupt Input}.
|
|
@end table
|
|
|
|
This function is a cancellation point in multi-threaded programs. This
|
|
is a problem if the thread allocates some resources (like memory, file
|
|
descriptors, semaphores or whatever) at the time @code{fcntl} is
|
|
called. If the thread gets cancelled these resources stay allocated
|
|
until the program ends. To avoid this calls to @code{fcntl} should be
|
|
protected using cancellation handlers.
|
|
@c ref pthread_cleanup_push / pthread_cleanup_pop
|
|
@end deftypefun
|
|
|
|
|
|
@node Duplicating Descriptors
|
|
@section Duplicating Descriptors
|
|
|
|
@cindex duplicating file descriptors
|
|
@cindex redirecting input and output
|
|
|
|
You can @dfn{duplicate} a file descriptor, or allocate another file
|
|
descriptor that refers to the same open file as the original. Duplicate
|
|
descriptors share one file position and one set of file status flags
|
|
(@pxref{File Status Flags}), but each has its own set of file descriptor
|
|
flags (@pxref{Descriptor Flags}).
|
|
|
|
The major use of duplicating a file descriptor is to implement
|
|
@dfn{redirection} of input or output: that is, to change the
|
|
file or pipe that a particular file descriptor corresponds to.
|
|
|
|
You can perform this operation using the @code{fcntl} function with the
|
|
@code{F_DUPFD} command, but there are also convenient functions
|
|
@code{dup} and @code{dup2} for duplicating descriptors.
|
|
|
|
@pindex unistd.h
|
|
@pindex fcntl.h
|
|
The @code{fcntl} function and flags are declared in @file{fcntl.h},
|
|
while prototypes for @code{dup} and @code{dup2} are in the header file
|
|
@file{unistd.h}.
|
|
|
|
@comment unistd.h
|
|
@comment POSIX.1
|
|
@deftypefun int dup (int @var{old})
|
|
This function copies descriptor @var{old} to the first available
|
|
descriptor number (the first number not currently open). It is
|
|
equivalent to @code{fcntl (@var{old}, F_DUPFD, 0)}.
|
|
@end deftypefun
|
|
|
|
@comment unistd.h
|
|
@comment POSIX.1
|
|
@deftypefun int dup2 (int @var{old}, int @var{new})
|
|
This function copies the descriptor @var{old} to descriptor number
|
|
@var{new}.
|
|
|
|
If @var{old} is an invalid descriptor, then @code{dup2} does nothing; it
|
|
does not close @var{new}. Otherwise, the new duplicate of @var{old}
|
|
replaces any previous meaning of descriptor @var{new}, as if @var{new}
|
|
were closed first.
|
|
|
|
If @var{old} and @var{new} are different numbers, and @var{old} is a
|
|
valid descriptor number, then @code{dup2} is equivalent to:
|
|
|
|
@smallexample
|
|
close (@var{new});
|
|
fcntl (@var{old}, F_DUPFD, @var{new})
|
|
@end smallexample
|
|
|
|
However, @code{dup2} does this atomically; there is no instant in the
|
|
middle of calling @code{dup2} at which @var{new} is closed and not yet a
|
|
duplicate of @var{old}.
|
|
@end deftypefun
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int F_DUPFD
|
|
This macro is used as the @var{command} argument to @code{fcntl}, to
|
|
copy the file descriptor given as the first argument.
|
|
|
|
The form of the call in this case is:
|
|
|
|
@smallexample
|
|
fcntl (@var{old}, F_DUPFD, @var{next-filedes})
|
|
@end smallexample
|
|
|
|
The @var{next-filedes} argument is of type @code{int} and specifies that
|
|
the file descriptor returned should be the next available one greater
|
|
than or equal to this value.
|
|
|
|
The return value from @code{fcntl} with this command is normally the value
|
|
of the new file descriptor. A return value of @math{-1} indicates an
|
|
error. The following @code{errno} error conditions are defined for
|
|
this command:
|
|
|
|
@table @code
|
|
@item EBADF
|
|
The @var{old} argument is invalid.
|
|
|
|
@item EINVAL
|
|
The @var{next-filedes} argument is invalid.
|
|
|
|
@item EMFILE
|
|
There are no more file descriptors available---your program is already
|
|
using the maximum. In BSD and GNU, the maximum is controlled by a
|
|
resource limit that can be changed; @pxref{Limits on Resources}, for
|
|
more information about the @code{RLIMIT_NOFILE} limit.
|
|
@end table
|
|
|
|
@code{ENFILE} is not a possible error code for @code{dup2} because
|
|
@code{dup2} does not create a new opening of a file; duplicate
|
|
descriptors do not count toward the limit which @code{ENFILE}
|
|
indicates. @code{EMFILE} is possible because it refers to the limit on
|
|
distinct descriptor numbers in use in one process.
|
|
@end deftypevr
|
|
|
|
Here is an example showing how to use @code{dup2} to do redirection.
|
|
Typically, redirection of the standard streams (like @code{stdin}) is
|
|
done by a shell or shell-like program before calling one of the
|
|
@code{exec} functions (@pxref{Executing a File}) to execute a new
|
|
program in a child process. When the new program is executed, it
|
|
creates and initializes the standard streams to point to the
|
|
corresponding file descriptors, before its @code{main} function is
|
|
invoked.
|
|
|
|
So, to redirect standard input to a file, the shell could do something
|
|
like:
|
|
|
|
@smallexample
|
|
pid = fork ();
|
|
if (pid == 0)
|
|
@{
|
|
char *filename;
|
|
char *program;
|
|
int file;
|
|
@dots{}
|
|
file = TEMP_FAILURE_RETRY (open (filename, O_RDONLY));
|
|
dup2 (file, STDIN_FILENO);
|
|
TEMP_FAILURE_RETRY (close (file));
|
|
execv (program, NULL);
|
|
@}
|
|
@end smallexample
|
|
|
|
There is also a more detailed example showing how to implement redirection
|
|
in the context of a pipeline of processes in @ref{Launching Jobs}.
|
|
|
|
|
|
@node Descriptor Flags
|
|
@section File Descriptor Flags
|
|
@cindex file descriptor flags
|
|
|
|
@dfn{File descriptor flags} are miscellaneous attributes of a file
|
|
descriptor. These flags are associated with particular file
|
|
descriptors, so that if you have created duplicate file descriptors
|
|
from a single opening of a file, each descriptor has its own set of flags.
|
|
|
|
Currently there is just one file descriptor flag: @code{FD_CLOEXEC},
|
|
which causes the descriptor to be closed if you use any of the
|
|
@code{exec@dots{}} functions (@pxref{Executing a File}).
|
|
|
|
The symbols in this section are defined in the header file
|
|
@file{fcntl.h}.
|
|
@pindex fcntl.h
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int F_GETFD
|
|
This macro is used as the @var{command} argument to @code{fcntl}, to
|
|
specify that it should return the file descriptor flags associated
|
|
with the @var{filedes} argument.
|
|
|
|
The normal return value from @code{fcntl} with this command is a
|
|
nonnegative number which can be interpreted as the bitwise OR of the
|
|
individual flags (except that currently there is only one flag to use).
|
|
|
|
In case of an error, @code{fcntl} returns @math{-1}. The following
|
|
@code{errno} error conditions are defined for this command:
|
|
|
|
@table @code
|
|
@item EBADF
|
|
The @var{filedes} argument is invalid.
|
|
@end table
|
|
@end deftypevr
|
|
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int F_SETFD
|
|
This macro is used as the @var{command} argument to @code{fcntl}, to
|
|
specify that it should set the file descriptor flags associated with the
|
|
@var{filedes} argument. This requires a third @code{int} argument to
|
|
specify the new flags, so the form of the call is:
|
|
|
|
@smallexample
|
|
fcntl (@var{filedes}, F_SETFD, @var{new-flags})
|
|
@end smallexample
|
|
|
|
The normal return value from @code{fcntl} with this command is an
|
|
unspecified value other than @math{-1}, which indicates an error.
|
|
The flags and error conditions are the same as for the @code{F_GETFD}
|
|
command.
|
|
@end deftypevr
|
|
|
|
The following macro is defined for use as a file descriptor flag with
|
|
the @code{fcntl} function. The value is an integer constant usable
|
|
as a bit mask value.
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int FD_CLOEXEC
|
|
@cindex close-on-exec (file descriptor flag)
|
|
This flag specifies that the file descriptor should be closed when
|
|
an @code{exec} function is invoked; see @ref{Executing a File}. When
|
|
a file descriptor is allocated (as with @code{open} or @code{dup}),
|
|
this bit is initially cleared on the new file descriptor, meaning that
|
|
descriptor will survive into the new program after @code{exec}.
|
|
@end deftypevr
|
|
|
|
If you want to modify the file descriptor flags, you should get the
|
|
current flags with @code{F_GETFD} and modify the value. Don't assume
|
|
that the flags listed here are the only ones that are implemented; your
|
|
program may be run years from now and more flags may exist then. For
|
|
example, here is a function to set or clear the flag @code{FD_CLOEXEC}
|
|
without altering any other flags:
|
|
|
|
@smallexample
|
|
/* @r{Set the @code{FD_CLOEXEC} flag of @var{desc} if @var{value} is nonzero,}
|
|
@r{or clear the flag if @var{value} is 0.}
|
|
@r{Return 0 on success, or -1 on error with @code{errno} set.} */
|
|
|
|
int
|
|
set_cloexec_flag (int desc, int value)
|
|
@{
|
|
int oldflags = fcntl (desc, F_GETFD, 0);
|
|
/* @r{If reading the flags failed, return error indication now.}
|
|
if (oldflags < 0)
|
|
return oldflags;
|
|
/* @r{Set just the flag we want to set.} */
|
|
if (value != 0)
|
|
oldflags |= FD_CLOEXEC;
|
|
else
|
|
oldflags &= ~FD_CLOEXEC;
|
|
/* @r{Store modified flag word in the descriptor.} */
|
|
return fcntl (desc, F_SETFD, oldflags);
|
|
@}
|
|
@end smallexample
|
|
|
|
@node File Status Flags
|
|
@section File Status Flags
|
|
@cindex file status flags
|
|
|
|
@dfn{File status flags} are used to specify attributes of the opening of a
|
|
file. Unlike the file descriptor flags discussed in @ref{Descriptor
|
|
Flags}, the file status flags are shared by duplicated file descriptors
|
|
resulting from a single opening of the file. The file status flags are
|
|
specified with the @var{flags} argument to @code{open};
|
|
@pxref{Opening and Closing Files}.
|
|
|
|
File status flags fall into three categories, which are described in the
|
|
following sections.
|
|
|
|
@itemize @bullet
|
|
@item
|
|
@ref{Access Modes}, specify what type of access is allowed to the
|
|
file: reading, writing, or both. They are set by @code{open} and are
|
|
returned by @code{fcntl}, but cannot be changed.
|
|
|
|
@item
|
|
@ref{Open-time Flags}, control details of what @code{open} will do.
|
|
These flags are not preserved after the @code{open} call.
|
|
|
|
@item
|
|
@ref{Operating Modes}, affect how operations such as @code{read} and
|
|
@code{write} are done. They are set by @code{open}, and can be fetched or
|
|
changed with @code{fcntl}.
|
|
@end itemize
|
|
|
|
The symbols in this section are defined in the header file
|
|
@file{fcntl.h}.
|
|
@pindex fcntl.h
|
|
|
|
@menu
|
|
* Access Modes:: Whether the descriptor can read or write.
|
|
* Open-time Flags:: Details of @code{open}.
|
|
* Operating Modes:: Special modes to control I/O operations.
|
|
* Getting File Status Flags:: Fetching and changing these flags.
|
|
@end menu
|
|
|
|
@node Access Modes
|
|
@subsection File Access Modes
|
|
|
|
The file access modes allow a file descriptor to be used for reading,
|
|
writing, or both. (In the GNU system, they can also allow none of these,
|
|
and allow execution of the file as a program.) The access modes are chosen
|
|
when the file is opened, and never change.
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int O_RDONLY
|
|
Open the file for read access.
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int O_WRONLY
|
|
Open the file for write access.
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int O_RDWR
|
|
Open the file for both reading and writing.
|
|
@end deftypevr
|
|
|
|
In the GNU system (and not in other systems), @code{O_RDONLY} and
|
|
@code{O_WRONLY} are independent bits that can be bitwise-ORed together,
|
|
and it is valid for either bit to be set or clear. This means that
|
|
@code{O_RDWR} is the same as @code{O_RDONLY|O_WRONLY}. A file access
|
|
mode of zero is permissible; it allows no operations that do input or
|
|
output to the file, but does allow other operations such as
|
|
@code{fchmod}. On the GNU system, since ``read-only'' or ``write-only''
|
|
is a misnomer, @file{fcntl.h} defines additional names for the file
|
|
access modes. These names are preferred when writing GNU-specific code.
|
|
But most programs will want to be portable to other POSIX.1 systems and
|
|
should use the POSIX.1 names above instead.
|
|
|
|
@comment fcntl.h
|
|
@comment GNU
|
|
@deftypevr Macro int O_READ
|
|
Open the file for reading. Same as @code{O_RDWR}; only defined on GNU.
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment GNU
|
|
@deftypevr Macro int O_WRITE
|
|
Open the file for reading. Same as @code{O_WRONLY}; only defined on GNU.
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment GNU
|
|
@deftypevr Macro int O_EXEC
|
|
Open the file for executing. Only defined on GNU.
|
|
@end deftypevr
|
|
|
|
To determine the file access mode with @code{fcntl}, you must extract
|
|
the access mode bits from the retrieved file status flags. In the GNU
|
|
system, you can just test the @code{O_READ} and @code{O_WRITE} bits in
|
|
the flags word. But in other POSIX.1 systems, reading and writing
|
|
access modes are not stored as distinct bit flags. The portable way to
|
|
extract the file access mode bits is with @code{O_ACCMODE}.
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int O_ACCMODE
|
|
This macro stands for a mask that can be bitwise-ANDed with the file
|
|
status flag value to produce a value representing the file access mode.
|
|
The mode will be @code{O_RDONLY}, @code{O_WRONLY}, or @code{O_RDWR}.
|
|
(In the GNU system it could also be zero, and it never includes the
|
|
@code{O_EXEC} bit.)
|
|
@end deftypevr
|
|
|
|
@node Open-time Flags
|
|
@subsection Open-time Flags
|
|
|
|
The open-time flags specify options affecting how @code{open} will behave.
|
|
These options are not preserved once the file is open. The exception to
|
|
this is @code{O_NONBLOCK}, which is also an I/O operating mode and so it
|
|
@emph{is} saved. @xref{Opening and Closing Files}, for how to call
|
|
@code{open}.
|
|
|
|
There are two sorts of options specified by open-time flags.
|
|
|
|
@itemize @bullet
|
|
@item
|
|
@dfn{File name translation flags} affect how @code{open} looks up the
|
|
file name to locate the file, and whether the file can be created.
|
|
@cindex file name translation flags
|
|
@cindex flags, file name translation
|
|
|
|
@item
|
|
@dfn{Open-time action flags} specify extra operations that @code{open} will
|
|
perform on the file once it is open.
|
|
@cindex open-time action flags
|
|
@cindex flags, open-time action
|
|
@end itemize
|
|
|
|
Here are the file name translation flags.
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int O_CREAT
|
|
If set, the file will be created if it doesn't already exist.
|
|
@c !!! mode arg, umask
|
|
@cindex create on open (file status flag)
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int O_EXCL
|
|
If both @code{O_CREAT} and @code{O_EXCL} are set, then @code{open} fails
|
|
if the specified file already exists. This is guaranteed to never
|
|
clobber an existing file.
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int O_NONBLOCK
|
|
@cindex non-blocking open
|
|
This prevents @code{open} from blocking for a ``long time'' to open the
|
|
file. This is only meaningful for some kinds of files, usually devices
|
|
such as serial ports; when it is not meaningful, it is harmless and
|
|
ignored. Often opening a port to a modem blocks until the modem reports
|
|
carrier detection; if @code{O_NONBLOCK} is specified, @code{open} will
|
|
return immediately without a carrier.
|
|
|
|
Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O operating
|
|
mode and a file name translation flag. This means that specifying
|
|
@code{O_NONBLOCK} in @code{open} also sets nonblocking I/O mode;
|
|
@pxref{Operating Modes}. To open the file without blocking but do normal
|
|
I/O that blocks, you must call @code{open} with @code{O_NONBLOCK} set and
|
|
then call @code{fcntl} to turn the bit off.
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int O_NOCTTY
|
|
If the named file is a terminal device, don't make it the controlling
|
|
terminal for the process. @xref{Job Control}, for information about
|
|
what it means to be the controlling terminal.
|
|
|
|
In the GNU system and 4.4 BSD, opening a file never makes it the
|
|
controlling terminal and @code{O_NOCTTY} is zero. However, other
|
|
systems may use a nonzero value for @code{O_NOCTTY} and set the
|
|
controlling terminal when you open a file that is a terminal device; so
|
|
to be portable, use @code{O_NOCTTY} when it is important to avoid this.
|
|
@cindex controlling terminal, setting
|
|
@end deftypevr
|
|
|
|
The following three file name translation flags exist only in the GNU system.
|
|
|
|
@comment fcntl.h
|
|
@comment GNU
|
|
@deftypevr Macro int O_IGNORE_CTTY
|
|
Do not recognize the named file as the controlling terminal, even if it
|
|
refers to the process's existing controlling terminal device. Operations
|
|
on the new file descriptor will never induce job control signals.
|
|
@xref{Job Control}.
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment GNU
|
|
@deftypevr Macro int O_NOLINK
|
|
If the named file is a symbolic link, open the link itself instead of
|
|
the file it refers to. (@code{fstat} on the new file descriptor will
|
|
return the information returned by @code{lstat} on the link's name.)
|
|
@cindex symbolic link, opening
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment GNU
|
|
@deftypevr Macro int O_NOTRANS
|
|
If the named file is specially translated, do not invoke the translator.
|
|
Open the bare file the translator itself sees.
|
|
@end deftypevr
|
|
|
|
|
|
The open-time action flags tell @code{open} to do additional operations
|
|
which are not really related to opening the file. The reason to do them
|
|
as part of @code{open} instead of in separate calls is that @code{open}
|
|
can do them @i{atomically}.
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int O_TRUNC
|
|
Truncate the file to zero length. This option is only useful for
|
|
regular files, not special files such as directories or FIFOs. POSIX.1
|
|
requires that you open the file for writing to use @code{O_TRUNC}. In
|
|
BSD and GNU you must have permission to write the file to truncate it,
|
|
but you need not open for write access.
|
|
|
|
This is the only open-time action flag specified by POSIX.1. There is
|
|
no good reason for truncation to be done by @code{open}, instead of by
|
|
calling @code{ftruncate} afterwards. The @code{O_TRUNC} flag existed in
|
|
Unix before @code{ftruncate} was invented, and is retained for backward
|
|
compatibility.
|
|
@end deftypevr
|
|
|
|
The remaining operating modes are BSD extensions. They exist only
|
|
on some systems. On other systems, these macros are not defined.
|
|
|
|
@comment fcntl.h
|
|
@comment BSD
|
|
@deftypevr Macro int O_SHLOCK
|
|
Acquire a shared lock on the file, as with @code{flock}.
|
|
@xref{File Locks}.
|
|
|
|
If @code{O_CREAT} is specified, the locking is done atomically when
|
|
creating the file. You are guaranteed that no other process will get
|
|
the lock on the new file first.
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment BSD
|
|
@deftypevr Macro int O_EXLOCK
|
|
Acquire an exclusive lock on the file, as with @code{flock}.
|
|
@xref{File Locks}. This is atomic like @code{O_SHLOCK}.
|
|
@end deftypevr
|
|
|
|
@node Operating Modes
|
|
@subsection I/O Operating Modes
|
|
|
|
The operating modes affect how input and output operations using a file
|
|
descriptor work. These flags are set by @code{open} and can be fetched
|
|
and changed with @code{fcntl}.
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int O_APPEND
|
|
The bit that enables append mode for the file. If set, then all
|
|
@code{write} operations write the data at the end of the file, extending
|
|
it, regardless of the current file position. This is the only reliable
|
|
way to append to a file. In append mode, you are guaranteed that the
|
|
data you write will always go to the current end of the file, regardless
|
|
of other processes writing to the file. Conversely, if you simply set
|
|
the file position to the end of file and write, then another process can
|
|
extend the file after you set the file position but before you write,
|
|
resulting in your data appearing someplace before the real end of file.
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int O_NONBLOCK
|
|
The bit that enables nonblocking mode for the file. If this bit is set,
|
|
@code{read} requests on the file can return immediately with a failure
|
|
status if there is no input immediately available, instead of blocking.
|
|
Likewise, @code{write} requests can also return immediately with a
|
|
failure status if the output can't be written immediately.
|
|
|
|
Note that the @code{O_NONBLOCK} flag is overloaded as both an I/O
|
|
operating mode and a file name translation flag; @pxref{Open-time Flags}.
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment BSD
|
|
@deftypevr Macro int O_NDELAY
|
|
This is an obsolete name for @code{O_NONBLOCK}, provided for
|
|
compatibility with BSD. It is not defined by the POSIX.1 standard.
|
|
@end deftypevr
|
|
|
|
The remaining operating modes are BSD and GNU extensions. They exist only
|
|
on some systems. On other systems, these macros are not defined.
|
|
|
|
@comment fcntl.h
|
|
@comment BSD
|
|
@deftypevr Macro int O_ASYNC
|
|
The bit that enables asynchronous input mode. If set, then @code{SIGIO}
|
|
signals will be generated when input is available. @xref{Interrupt Input}.
|
|
|
|
Asynchronous input mode is a BSD feature.
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment BSD
|
|
@deftypevr Macro int O_FSYNC
|
|
The bit that enables synchronous writing for the file. If set, each
|
|
@code{write} call will make sure the data is reliably stored on disk before
|
|
returning. @c !!! xref fsync
|
|
|
|
Synchronous writing is a BSD feature.
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment BSD
|
|
@deftypevr Macro int O_SYNC
|
|
This is another name for @code{O_FSYNC}. They have the same value.
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment GNU
|
|
@deftypevr Macro int O_NOATIME
|
|
If this bit is set, @code{read} will not update the access time of the
|
|
file. @xref{File Times}. This is used by programs that do backups, so
|
|
that backing a file up does not count as reading it.
|
|
Only the owner of the file or the superuser may use this bit.
|
|
|
|
This is a GNU extension.
|
|
@end deftypevr
|
|
|
|
@node Getting File Status Flags
|
|
@subsection Getting and Setting File Status Flags
|
|
|
|
The @code{fcntl} function can fetch or change file status flags.
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int F_GETFL
|
|
This macro is used as the @var{command} argument to @code{fcntl}, to
|
|
read the file status flags for the open file with descriptor
|
|
@var{filedes}.
|
|
|
|
The normal return value from @code{fcntl} with this command is a
|
|
nonnegative number which can be interpreted as the bitwise OR of the
|
|
individual flags. Since the file access modes are not single-bit values,
|
|
you can mask off other bits in the returned flags with @code{O_ACCMODE}
|
|
to compare them.
|
|
|
|
In case of an error, @code{fcntl} returns @math{-1}. The following
|
|
@code{errno} error conditions are defined for this command:
|
|
|
|
@table @code
|
|
@item EBADF
|
|
The @var{filedes} argument is invalid.
|
|
@end table
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int F_SETFL
|
|
This macro is used as the @var{command} argument to @code{fcntl}, to set
|
|
the file status flags for the open file corresponding to the
|
|
@var{filedes} argument. This command requires a third @code{int}
|
|
argument to specify the new flags, so the call looks like this:
|
|
|
|
@smallexample
|
|
fcntl (@var{filedes}, F_SETFL, @var{new-flags})
|
|
@end smallexample
|
|
|
|
You can't change the access mode for the file in this way; that is,
|
|
whether the file descriptor was opened for reading or writing.
|
|
|
|
The normal return value from @code{fcntl} with this command is an
|
|
unspecified value other than @math{-1}, which indicates an error. The
|
|
error conditions are the same as for the @code{F_GETFL} command.
|
|
@end deftypevr
|
|
|
|
If you want to modify the file status flags, you should get the current
|
|
flags with @code{F_GETFL} and modify the value. Don't assume that the
|
|
flags listed here are the only ones that are implemented; your program
|
|
may be run years from now and more flags may exist then. For example,
|
|
here is a function to set or clear the flag @code{O_NONBLOCK} without
|
|
altering any other flags:
|
|
|
|
@smallexample
|
|
@group
|
|
/* @r{Set the @code{O_NONBLOCK} flag of @var{desc} if @var{value} is nonzero,}
|
|
@r{or clear the flag if @var{value} is 0.}
|
|
@r{Return 0 on success, or -1 on error with @code{errno} set.} */
|
|
|
|
int
|
|
set_nonblock_flag (int desc, int value)
|
|
@{
|
|
int oldflags = fcntl (desc, F_GETFL, 0);
|
|
/* @r{If reading the flags failed, return error indication now.} */
|
|
if (oldflags == -1)
|
|
return -1;
|
|
/* @r{Set just the flag we want to set.} */
|
|
if (value != 0)
|
|
oldflags |= O_NONBLOCK;
|
|
else
|
|
oldflags &= ~O_NONBLOCK;
|
|
/* @r{Store modified flag word in the descriptor.} */
|
|
return fcntl (desc, F_SETFL, oldflags);
|
|
@}
|
|
@end group
|
|
@end smallexample
|
|
|
|
@node File Locks
|
|
@section File Locks
|
|
|
|
@cindex file locks
|
|
@cindex record locking
|
|
The remaining @code{fcntl} commands are used to support @dfn{record
|
|
locking}, which permits multiple cooperating programs to prevent each
|
|
other from simultaneously accessing parts of a file in error-prone
|
|
ways.
|
|
|
|
@cindex exclusive lock
|
|
@cindex write lock
|
|
An @dfn{exclusive} or @dfn{write} lock gives a process exclusive access
|
|
for writing to the specified part of the file. While a write lock is in
|
|
place, no other process can lock that part of the file.
|
|
|
|
@cindex shared lock
|
|
@cindex read lock
|
|
A @dfn{shared} or @dfn{read} lock prohibits any other process from
|
|
requesting a write lock on the specified part of the file. However,
|
|
other processes can request read locks.
|
|
|
|
The @code{read} and @code{write} functions do not actually check to see
|
|
whether there are any locks in place. If you want to implement a
|
|
locking protocol for a file shared by multiple processes, your application
|
|
must do explicit @code{fcntl} calls to request and clear locks at the
|
|
appropriate points.
|
|
|
|
Locks are associated with processes. A process can only have one kind
|
|
of lock set for each byte of a given file. When any file descriptor for
|
|
that file is closed by the process, all of the locks that process holds
|
|
on that file are released, even if the locks were made using other
|
|
descriptors that remain open. Likewise, locks are released when a
|
|
process exits, and are not inherited by child processes created using
|
|
@code{fork} (@pxref{Creating a Process}).
|
|
|
|
When making a lock, use a @code{struct flock} to specify what kind of
|
|
lock and where. This data type and the associated macros for the
|
|
@code{fcntl} function are declared in the header file @file{fcntl.h}.
|
|
@pindex fcntl.h
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftp {Data Type} {struct flock}
|
|
This structure is used with the @code{fcntl} function to describe a file
|
|
lock. It has these members:
|
|
|
|
@table @code
|
|
@item short int l_type
|
|
Specifies the type of the lock; one of @code{F_RDLCK}, @code{F_WRLCK}, or
|
|
@code{F_UNLCK}.
|
|
|
|
@item short int l_whence
|
|
This corresponds to the @var{whence} argument to @code{fseek} or
|
|
@code{lseek}, and specifies what the offset is relative to. Its value
|
|
can be one of @code{SEEK_SET}, @code{SEEK_CUR}, or @code{SEEK_END}.
|
|
|
|
@item off_t l_start
|
|
This specifies the offset of the start of the region to which the lock
|
|
applies, and is given in bytes relative to the point specified by
|
|
@code{l_whence} member.
|
|
|
|
@item off_t l_len
|
|
This specifies the length of the region to be locked. A value of
|
|
@code{0} is treated specially; it means the region extends to the end of
|
|
the file.
|
|
|
|
@item pid_t l_pid
|
|
This field is the process ID (@pxref{Process Creation Concepts}) of the
|
|
process holding the lock. It is filled in by calling @code{fcntl} with
|
|
the @code{F_GETLK} command, but is ignored when making a lock.
|
|
@end table
|
|
@end deftp
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int F_GETLK
|
|
This macro is used as the @var{command} argument to @code{fcntl}, to
|
|
specify that it should get information about a lock. This command
|
|
requires a third argument of type @w{@code{struct flock *}} to be passed
|
|
to @code{fcntl}, so that the form of the call is:
|
|
|
|
@smallexample
|
|
fcntl (@var{filedes}, F_GETLK, @var{lockp})
|
|
@end smallexample
|
|
|
|
If there is a lock already in place that would block the lock described
|
|
by the @var{lockp} argument, information about that lock overwrites
|
|
@code{*@var{lockp}}. Existing locks are not reported if they are
|
|
compatible with making a new lock as specified. Thus, you should
|
|
specify a lock type of @code{F_WRLCK} if you want to find out about both
|
|
read and write locks, or @code{F_RDLCK} if you want to find out about
|
|
write locks only.
|
|
|
|
There might be more than one lock affecting the region specified by the
|
|
@var{lockp} argument, but @code{fcntl} only returns information about
|
|
one of them. The @code{l_whence} member of the @var{lockp} structure is
|
|
set to @code{SEEK_SET} and the @code{l_start} and @code{l_len} fields
|
|
set to identify the locked region.
|
|
|
|
If no lock applies, the only change to the @var{lockp} structure is to
|
|
update the @code{l_type} to a value of @code{F_UNLCK}.
|
|
|
|
The normal return value from @code{fcntl} with this command is an
|
|
unspecified value other than @math{-1}, which is reserved to indicate an
|
|
error. The following @code{errno} error conditions are defined for
|
|
this command:
|
|
|
|
@table @code
|
|
@item EBADF
|
|
The @var{filedes} argument is invalid.
|
|
|
|
@item EINVAL
|
|
Either the @var{lockp} argument doesn't specify valid lock information,
|
|
or the file associated with @var{filedes} doesn't support locks.
|
|
@end table
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int F_SETLK
|
|
This macro is used as the @var{command} argument to @code{fcntl}, to
|
|
specify that it should set or clear a lock. This command requires a
|
|
third argument of type @w{@code{struct flock *}} to be passed to
|
|
@code{fcntl}, so that the form of the call is:
|
|
|
|
@smallexample
|
|
fcntl (@var{filedes}, F_SETLK, @var{lockp})
|
|
@end smallexample
|
|
|
|
If the process already has a lock on any part of the region, the old lock
|
|
on that part is replaced with the new lock. You can remove a lock
|
|
by specifying a lock type of @code{F_UNLCK}.
|
|
|
|
If the lock cannot be set, @code{fcntl} returns immediately with a value
|
|
of @math{-1}. This function does not block waiting for other processes
|
|
to release locks. If @code{fcntl} succeeds, it return a value other
|
|
than @math{-1}.
|
|
|
|
The following @code{errno} error conditions are defined for this
|
|
function:
|
|
|
|
@table @code
|
|
@item EAGAIN
|
|
@itemx EACCES
|
|
The lock cannot be set because it is blocked by an existing lock on the
|
|
file. Some systems use @code{EAGAIN} in this case, and other systems
|
|
use @code{EACCES}; your program should treat them alike, after
|
|
@code{F_SETLK}. (The GNU system always uses @code{EAGAIN}.)
|
|
|
|
@item EBADF
|
|
Either: the @var{filedes} argument is invalid; you requested a read lock
|
|
but the @var{filedes} is not open for read access; or, you requested a
|
|
write lock but the @var{filedes} is not open for write access.
|
|
|
|
@item EINVAL
|
|
Either the @var{lockp} argument doesn't specify valid lock information,
|
|
or the file associated with @var{filedes} doesn't support locks.
|
|
|
|
@item ENOLCK
|
|
The system has run out of file lock resources; there are already too
|
|
many file locks in place.
|
|
|
|
Well-designed file systems never report this error, because they have no
|
|
limitation on the number of locks. However, you must still take account
|
|
of the possibility of this error, as it could result from network access
|
|
to a file system on another machine.
|
|
@end table
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@deftypevr Macro int F_SETLKW
|
|
This macro is used as the @var{command} argument to @code{fcntl}, to
|
|
specify that it should set or clear a lock. It is just like the
|
|
@code{F_SETLK} command, but causes the process to block (or wait)
|
|
until the request can be specified.
|
|
|
|
This command requires a third argument of type @code{struct flock *}, as
|
|
for the @code{F_SETLK} command.
|
|
|
|
The @code{fcntl} return values and errors are the same as for the
|
|
@code{F_SETLK} command, but these additional @code{errno} error conditions
|
|
are defined for this command:
|
|
|
|
@table @code
|
|
@item EINTR
|
|
The function was interrupted by a signal while it was waiting.
|
|
@xref{Interrupted Primitives}.
|
|
|
|
@item EDEADLK
|
|
The specified region is being locked by another process. But that
|
|
process is waiting to lock a region which the current process has
|
|
locked, so waiting for the lock would result in deadlock. The system
|
|
does not guarantee that it will detect all such conditions, but it lets
|
|
you know if it notices one.
|
|
@end table
|
|
@end deftypevr
|
|
|
|
|
|
The following macros are defined for use as values for the @code{l_type}
|
|
member of the @code{flock} structure. The values are integer constants.
|
|
|
|
@table @code
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@vindex F_RDLCK
|
|
@item F_RDLCK
|
|
This macro is used to specify a read (or shared) lock.
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@vindex F_WRLCK
|
|
@item F_WRLCK
|
|
This macro is used to specify a write (or exclusive) lock.
|
|
|
|
@comment fcntl.h
|
|
@comment POSIX.1
|
|
@vindex F_UNLCK
|
|
@item F_UNLCK
|
|
This macro is used to specify that the region is unlocked.
|
|
@end table
|
|
|
|
As an example of a situation where file locking is useful, consider a
|
|
program that can be run simultaneously by several different users, that
|
|
logs status information to a common file. One example of such a program
|
|
might be a game that uses a file to keep track of high scores. Another
|
|
example might be a program that records usage or accounting information
|
|
for billing purposes.
|
|
|
|
Having multiple copies of the program simultaneously writing to the
|
|
file could cause the contents of the file to become mixed up. But
|
|
you can prevent this kind of problem by setting a write lock on the
|
|
file before actually writing to the file.
|
|
|
|
If the program also needs to read the file and wants to make sure that
|
|
the contents of the file are in a consistent state, then it can also use
|
|
a read lock. While the read lock is set, no other process can lock
|
|
that part of the file for writing.
|
|
|
|
@c ??? This section could use an example program.
|
|
|
|
Remember that file locks are only a @emph{voluntary} protocol for
|
|
controlling access to a file. There is still potential for access to
|
|
the file by programs that don't use the lock protocol.
|
|
|
|
@node Interrupt Input
|
|
@section Interrupt-Driven Input
|
|
|
|
@cindex interrupt-driven input
|
|
If you set the @code{O_ASYNC} status flag on a file descriptor
|
|
(@pxref{File Status Flags}), a @code{SIGIO} signal is sent whenever
|
|
input or output becomes possible on that file descriptor. The process
|
|
or process group to receive the signal can be selected by using the
|
|
@code{F_SETOWN} command to the @code{fcntl} function. If the file
|
|
descriptor is a socket, this also selects the recipient of @code{SIGURG}
|
|
signals that are delivered when out-of-band data arrives on that socket;
|
|
see @ref{Out-of-Band Data}. (@code{SIGURG} is sent in any situation
|
|
where @code{select} would report the socket as having an ``exceptional
|
|
condition''. @xref{Waiting for I/O}.)
|
|
|
|
If the file descriptor corresponds to a terminal device, then @code{SIGIO}
|
|
signals are sent to the foreground process group of the terminal.
|
|
@xref{Job Control}.
|
|
|
|
@pindex fcntl.h
|
|
The symbols in this section are defined in the header file
|
|
@file{fcntl.h}.
|
|
|
|
@comment fcntl.h
|
|
@comment BSD
|
|
@deftypevr Macro int F_GETOWN
|
|
This macro is used as the @var{command} argument to @code{fcntl}, to
|
|
specify that it should get information about the process or process
|
|
group to which @code{SIGIO} signals are sent. (For a terminal, this is
|
|
actually the foreground process group ID, which you can get using
|
|
@code{tcgetpgrp}; see @ref{Terminal Access Functions}.)
|
|
|
|
The return value is interpreted as a process ID; if negative, its
|
|
absolute value is the process group ID.
|
|
|
|
The following @code{errno} error condition is defined for this command:
|
|
|
|
@table @code
|
|
@item EBADF
|
|
The @var{filedes} argument is invalid.
|
|
@end table
|
|
@end deftypevr
|
|
|
|
@comment fcntl.h
|
|
@comment BSD
|
|
@deftypevr Macro int F_SETOWN
|
|
This macro is used as the @var{command} argument to @code{fcntl}, to
|
|
specify that it should set the process or process group to which
|
|
@code{SIGIO} signals are sent. This command requires a third argument
|
|
of type @code{pid_t} to be passed to @code{fcntl}, so that the form of
|
|
the call is:
|
|
|
|
@smallexample
|
|
fcntl (@var{filedes}, F_SETOWN, @var{pid})
|
|
@end smallexample
|
|
|
|
The @var{pid} argument should be a process ID. You can also pass a
|
|
negative number whose absolute value is a process group ID.
|
|
|
|
The return value from @code{fcntl} with this command is @math{-1}
|
|
in case of error and some other value if successful. The following
|
|
@code{errno} error conditions are defined for this command:
|
|
|
|
@table @code
|
|
@item EBADF
|
|
The @var{filedes} argument is invalid.
|
|
|
|
@item ESRCH
|
|
There is no process or process group corresponding to @var{pid}.
|
|
@end table
|
|
@end deftypevr
|
|
|
|
@c ??? This section could use an example program.
|
|
|
|
@node IOCTLs
|
|
@section Generic I/O Control operations
|
|
@cindex generic i/o control operations
|
|
@cindex IOCTLs
|
|
|
|
The GNU system can handle most input/output operations on many different
|
|
devices and objects in terms of a few file primitives - @code{read},
|
|
@code{write} and @code{lseek}. However, most devices also have a few
|
|
peculiar operations which do not fit into this model. Such as:
|
|
|
|
@itemize @bullet
|
|
|
|
@item
|
|
Changing the character font used on a terminal.
|
|
|
|
@item
|
|
Telling a magnetic tape system to rewind or fast forward. (Since they
|
|
cannot move in byte increments, @code{lseek} is inapplicable).
|
|
|
|
@item
|
|
Ejecting a disk from a drive.
|
|
|
|
@item
|
|
Playing an audio track from a CD-ROM drive.
|
|
|
|
@item
|
|
Maintaining routing tables for a network.
|
|
|
|
@end itemize
|
|
|
|
Although some such objects such as sockets and terminals
|
|
@footnote{Actually, the terminal-specific functions are implemented with
|
|
IOCTLs on many platforms.} have special functions of their own, it would
|
|
not be practical to create functions for all these cases.
|
|
|
|
Instead these minor operations, known as @dfn{IOCTL}s, are assigned code
|
|
numbers and multiplexed through the @code{ioctl} function, defined in
|
|
@code{sys/ioctl.h}. The code numbers themselves are defined in many
|
|
different headers.
|
|
|
|
@deftypefun int ioctl (int @var{filedes}, int @var{command}, @dots{})
|
|
|
|
The @code{ioctl} function performs the generic I/O operation
|
|
@var{command} on @var{filedes}.
|
|
|
|
A third argument is usually present, either a single number or a pointer
|
|
to a structure. The meaning of this argument, the returned value, and
|
|
any error codes depends upon the command used. Often @math{-1} is
|
|
returned for a failure.
|
|
|
|
@end deftypefun
|
|
|
|
On some systems, IOCTLs used by different devices share the same numbers.
|
|
Thus, although use of an inappropriate IOCTL @emph{usually} only produces
|
|
an error, you should not attempt to use device-specific IOCTLs on an
|
|
unknown device.
|
|
|
|
Most IOCTLs are OS-specific and/or only used in special system utilities,
|
|
and are thus beyond the scope of this document. For an example of the use
|
|
of an IOCTL, see @ref{Out-of-Band Data}.
|