6599da043e
From-SVN: r14877
548 lines
12 KiB
Plaintext
548 lines
12 KiB
Plaintext
.TH DBZ 3Z "3 Feb 1991"
|
|
.BY "C News"
|
|
.SH NAME
|
|
dbminit, fetch, store, dbmclose \- somewhat dbm-compatible database routines
|
|
.br
|
|
dbzfresh, dbzagain, dbzfetch, dbzstore \- database routines
|
|
.br
|
|
dbzsync, dbzsize, dbzincore, dbzcancel, dbzdebug \- database routines
|
|
.SH SYNOPSIS
|
|
.nf
|
|
.B #include <dbz.h>
|
|
.PP
|
|
.B dbminit(base)
|
|
.B char *base;
|
|
.PP
|
|
.B datum
|
|
.B fetch(key)
|
|
.B datum key;
|
|
.PP
|
|
.B store(key, value)
|
|
.B datum key;
|
|
.B datum value;
|
|
.PP
|
|
.B dbmclose()
|
|
.PP
|
|
.B dbzfresh(base, size, fieldsep, cmap, tagmask)
|
|
.B char *base;
|
|
.B long size;
|
|
.B int fieldsep;
|
|
.B int cmap;
|
|
.B long tagmask;
|
|
.PP
|
|
.B dbzagain(base, oldbase)
|
|
.B char *base;
|
|
.B char *oldbase;
|
|
.PP
|
|
.B datum
|
|
.B dbzfetch(key)
|
|
.B datum key;
|
|
.PP
|
|
.B dbzstore(key, value)
|
|
.B datum key;
|
|
.B datum value;
|
|
.PP
|
|
.B dbzsync()
|
|
.PP
|
|
.B long
|
|
.B dbzsize(nentries)
|
|
.B long nentries;
|
|
.PP
|
|
.B dbzincore(newvalue)
|
|
.PP
|
|
.B dbzcancel()
|
|
.PP
|
|
.B dbzdebug(newvalue)
|
|
.SH DESCRIPTION
|
|
These functions provide an indexing system for rapid random access to a
|
|
text file (the
|
|
.I base
|
|
.IR file ).
|
|
Subject to certain constraints, they are call-compatible with
|
|
.IR dbm (3),
|
|
although they also provide some extensions.
|
|
(Note that they are
|
|
.I not
|
|
file-compatible with
|
|
.I dbm
|
|
or any variant thereof.)
|
|
.PP
|
|
In principle,
|
|
.I dbz
|
|
stores key-value pairs, where both key and value are arbitrary sequences
|
|
of bytes, specified to the functions by
|
|
values of type
|
|
.IR datum ,
|
|
typedefed in the header file to be a structure with members
|
|
.I dptr
|
|
(a value of type
|
|
.I char *
|
|
pointing to the bytes)
|
|
and
|
|
.I dsize
|
|
(a value of type
|
|
.I int
|
|
indicating how long the byte sequence is).
|
|
.PP
|
|
In practice,
|
|
.I dbz
|
|
is more restricted than
|
|
.IR dbm .
|
|
A
|
|
.I dbz
|
|
database
|
|
must be an index into a base file,
|
|
with the database
|
|
.IR value s
|
|
being
|
|
.IR fseek (3)
|
|
offsets into the base file.
|
|
Each such
|
|
.I value
|
|
must ``point to'' a place in the base file where the corresponding
|
|
.I key
|
|
sequence is found.
|
|
A key can be no longer than
|
|
.SM DBZMAXKEY
|
|
(a constant defined in the header file) bytes.
|
|
No key can be an initial subsequence of another,
|
|
which in most applications requires that keys be
|
|
either bracketed or terminated in some way (see the
|
|
discussion of the
|
|
.I fieldsep
|
|
parameter of
|
|
.IR dbzfresh ,
|
|
below,
|
|
for a fine point on terminators).
|
|
.PP
|
|
.I Dbminit
|
|
opens a database,
|
|
an index into the base file
|
|
.IR base ,
|
|
consisting of files
|
|
.IB base .dir
|
|
and
|
|
.IB base .pag
|
|
which must already exist.
|
|
(If the database is new, they should be zero-length files.)
|
|
Subsequent accesses go to that database until
|
|
.I dbmclose
|
|
is called to close the database.
|
|
The base file need not exist at the time of the
|
|
.IR dbminit ,
|
|
but it must exist before accesses are attempted.
|
|
.PP
|
|
.I Fetch
|
|
searches the database for the specified
|
|
.IR key ,
|
|
returning the corresponding
|
|
.IR value
|
|
if any.
|
|
.I Store
|
|
stores the
|
|
.IR key - value
|
|
pair in the database.
|
|
.I Store
|
|
will fail unless the database files are writeable.
|
|
See below for a complication arising from case mapping.
|
|
.PP
|
|
.I Dbzfresh
|
|
is a variant of
|
|
.I dbminit
|
|
for creating a new database with more control over details.
|
|
Unlike for
|
|
.IR dbminit ,
|
|
the database files need not exist:
|
|
they will be created if necessary,
|
|
and truncated in any case.
|
|
.PP
|
|
.IR Dbzfresh 's
|
|
.I size
|
|
parameter specifies the size of the first hash table within the database,
|
|
in key-value pairs.
|
|
Performance will be best if
|
|
.I size
|
|
is a prime number and
|
|
the number of key-value pairs stored in the database does not exceed
|
|
about 2/3 of
|
|
.IR size .
|
|
(The
|
|
.I dbzsize
|
|
function, given the expected number of key-value pairs,
|
|
will suggest a database size that meets these criteria.)
|
|
Assuming that an
|
|
.I fseek
|
|
offset is 4 bytes,
|
|
the
|
|
.B .pag
|
|
file will be
|
|
.RI 4* size
|
|
bytes
|
|
(the
|
|
.B .dir
|
|
file is tiny and roughly constant in size)
|
|
until
|
|
the number of key-value pairs exceeds about 80% of
|
|
.IR size .
|
|
(Nothing awful will happen if the database grows beyond 100% of
|
|
.IR size ,
|
|
but accesses will slow down somewhat and the
|
|
.B .pag
|
|
file will grow somewhat.)
|
|
.PP
|
|
.IR Dbzfresh 's
|
|
.I fieldsep
|
|
parameter specifies the field separator in the base file.
|
|
If this is not
|
|
NUL (0), and the last character of a
|
|
.I key
|
|
argument is NUL, that NUL compares equal to either a NUL or a
|
|
.I fieldsep
|
|
in the base file.
|
|
This permits use of NUL to terminate key strings without requiring that
|
|
NULs appear in the base file.
|
|
The
|
|
.I fieldsep
|
|
of a database created with
|
|
.I dbminit
|
|
is the horizontal-tab character.
|
|
.PP
|
|
For use in news systems, various forms of case mapping (e.g. uppercase to
|
|
lowercase) in keys are available.
|
|
The
|
|
.I cmap
|
|
parameter to
|
|
.I dbzfresh
|
|
is a single character specifying which of several mapping algorithms to use.
|
|
Available algorithms are:
|
|
.RS
|
|
.TP
|
|
.B 0
|
|
case-sensitive: no case mapping
|
|
.TP
|
|
.B B
|
|
same as
|
|
.B 0
|
|
.TP
|
|
.B NUL
|
|
same as
|
|
.B 0
|
|
.TP
|
|
.B =
|
|
case-insensitive: uppercase and lowercase equivalent
|
|
.TP
|
|
.B b
|
|
same as
|
|
.B =
|
|
.TP
|
|
.B C
|
|
RFC822 message-ID rules, case-sensitive before `@' (with certain exceptions)
|
|
and case-insensitive after
|
|
.TP
|
|
.B ?
|
|
whatever the local default is, normally
|
|
.B C
|
|
.RE
|
|
.PP
|
|
Mapping algorithm
|
|
.B 0
|
|
(no mapping) is faster than the others and is overwhelmingly the correct
|
|
choice for most applications.
|
|
Unless compatibility constraints interfere, it is more efficient to pre-map
|
|
the keys, storing mapped keys in the base file, than to have
|
|
.I dbz
|
|
do the mapping on every search.
|
|
.PP
|
|
For historical reasons,
|
|
.I fetch
|
|
and
|
|
.I store
|
|
expect their
|
|
.I key
|
|
arguments to be pre-mapped, but expect unmapped keys in the base file.
|
|
.I Dbzfetch
|
|
and
|
|
.I dbzstore
|
|
do the same jobs but handle all case mapping internally,
|
|
so the customer need not worry about it.
|
|
.PP
|
|
.I Dbz
|
|
stores only the database
|
|
.IR value s
|
|
in its files, relying on reference to the base file to confirm a hit on a key.
|
|
References to the base file can be minimized, greatly speeding up searches,
|
|
if a little bit of information about the keys can be stored in the
|
|
.I dbz
|
|
files.
|
|
This is ``free'' if there are some unused bits in an
|
|
.I fseek
|
|
offset,
|
|
so that the offset can be
|
|
.I tagged
|
|
with some information about the key.
|
|
The
|
|
.I tagmask
|
|
parameter of
|
|
.I dbzfresh
|
|
allows specifying the location of unused bits.
|
|
.I Tagmask
|
|
should be a mask with
|
|
one group of
|
|
contiguous
|
|
.B 1
|
|
bits.
|
|
The bits in the mask should
|
|
be unused (0) in
|
|
.I most
|
|
offsets.
|
|
The bit immediately above the mask (the
|
|
.I flag
|
|
bit) should be unused (0) in
|
|
.I all
|
|
offsets;
|
|
.I (dbz)store
|
|
will reject attempts to store a key-value pair in which the
|
|
.I value
|
|
has the flag bit on.
|
|
Apart from this restriction, tagging is invisible to the user.
|
|
As a special case, a
|
|
.I tagmask
|
|
of 1 means ``no tagging'', for use with enormous base files or
|
|
on systems with unusual offset representations.
|
|
.PP
|
|
A
|
|
.I size
|
|
of 0
|
|
given to
|
|
.I dbzfresh
|
|
is synonymous with the local default;
|
|
the normal default is suitable for tables of 90-100,000
|
|
key-value pairs.
|
|
A
|
|
.I cmap
|
|
of 0 (NUL) is synonymous with the character
|
|
.BR 0 ,
|
|
signifying no case mapping
|
|
(note that the character
|
|
.B ?
|
|
specifies the local default mapping,
|
|
normally
|
|
.BR C ).
|
|
A
|
|
.I tagmask
|
|
of 0 is synonymous with the local default tag mask,
|
|
normally 0x7f000000 (specifying the top bit in a 32-bit offset
|
|
as the flag bit, and the next 7 bits as the mask,
|
|
which is suitable for base files up to circa 24MB).
|
|
Calling
|
|
.I dbminit(name)
|
|
with the database files empty is equivalent to calling
|
|
.IR dbzfresh(name,0,'\et','?',0) .
|
|
.PP
|
|
When databases are regenerated periodically, as in news,
|
|
it is simplest to pick the parameters for a new database based on the old one.
|
|
This also permits some memory of past sizes of the old database, so that
|
|
a new database size can be chosen to cover expected fluctuations.
|
|
.I Dbzagain
|
|
is a variant of
|
|
.I dbminit
|
|
for creating a new database as a new generation of an old database.
|
|
The database files for
|
|
.I oldbase
|
|
must exist.
|
|
.I Dbzagain
|
|
is equivalent to calling
|
|
.I dbzfresh
|
|
with the same field separator, case mapping, and tag mask as the old database,
|
|
and a
|
|
.I size
|
|
equal to the result of applying
|
|
.I dbzsize
|
|
to the largest number of entries in the
|
|
.I oldbase
|
|
database and its previous 10 generations.
|
|
.PP
|
|
When many accesses are being done by the same program,
|
|
.I dbz
|
|
is massively faster if its first hash table is in memory.
|
|
If an internal flag is 1,
|
|
an attempt is made to read the table in when
|
|
the database is opened, and
|
|
.I dbmclose
|
|
writes it out to disk again (if it was read successfully and
|
|
has been modified).
|
|
.I Dbzincore
|
|
sets the flag to
|
|
.I newvalue
|
|
(which should be 0 or 1)
|
|
and returns the previous value;
|
|
this does not affect the status of a database that has already been opened.
|
|
The default is 0.
|
|
The attempt to read the table in may fail due to memory shortage;
|
|
in this case
|
|
.I dbz
|
|
quietly falls back on its default behavior.
|
|
.IR Store s
|
|
to an in-memory database are not (in general) written out to the file
|
|
until
|
|
.IR dbmclose
|
|
or
|
|
.IR dbzsync ,
|
|
so if robustness in the presence of crashes
|
|
or concurrent accesses
|
|
is crucial, in-memory databases
|
|
should probably be avoided.
|
|
.PP
|
|
.I Dbzsync
|
|
causes all buffers etc. to be flushed out to the files.
|
|
It is typically used as a precaution against crashes or concurrent accesses
|
|
when a
|
|
.IR dbz -using
|
|
process will be running for a long time.
|
|
It is a somewhat expensive operation,
|
|
especially
|
|
for an in-memory database.
|
|
.PP
|
|
.I Dbzcancel
|
|
cancels any pending writes from buffers.
|
|
This is typically useful only for in-core databases, since writes are
|
|
otherwise done immediately.
|
|
Its main purpose is to let a child process, in the wake of a
|
|
.IR fork ,
|
|
do a
|
|
.I dbmclose
|
|
without writing its parent's data to disk.
|
|
.PP
|
|
If
|
|
.I dbz
|
|
has been compiled with debugging facilities available (which makes it
|
|
bigger and a bit slower),
|
|
.I dbzdebug
|
|
alters the value (and returns the previous value) of an internal flag
|
|
which (when 1; default is 0) causes
|
|
verbose and cryptic debugging output on standard output.
|
|
.PP
|
|
Concurrent reading of databases is fairly safe,
|
|
but there is no (inter)locking,
|
|
so concurrent updating is not.
|
|
.PP
|
|
The database files include a record of the byte order of the processor
|
|
creating the database, and accesses by processors with different byte
|
|
order will work, although they will be slightly slower.
|
|
Byte order is preserved by
|
|
.IR dbzagain .
|
|
However,
|
|
agreement on the size and internal structure of an
|
|
.I fseek
|
|
offset is necessary, as is consensus on
|
|
the character set.
|
|
.PP
|
|
An open database occupies three
|
|
.I stdio
|
|
streams and their corresponding file descriptors;
|
|
a fourth is needed for an in-memory database.
|
|
Memory consumption is negligible (except for
|
|
.I stdio
|
|
buffers) except for in-memory databases.
|
|
.SH SEE ALSO
|
|
dbz(1), dbm(3)
|
|
.SH DIAGNOSTICS
|
|
Functions returning
|
|
.I int
|
|
values return 0 for success, \-1 for failure.
|
|
Functions returning
|
|
.I datum
|
|
values return a value with
|
|
.I dptr
|
|
set to NULL for failure.
|
|
.I Dbminit
|
|
attempts to have
|
|
.I errno
|
|
set plausibly on return, but otherwise this is not guaranteed.
|
|
An
|
|
.I errno
|
|
of
|
|
.B EDOM
|
|
from
|
|
.I dbminit
|
|
indicates that the database did not appear to be in
|
|
.I dbz
|
|
format.
|
|
.SH HISTORY
|
|
The original
|
|
.I dbz
|
|
was written by
|
|
Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us).
|
|
Later contributions by David Butler and Mark Moraes.
|
|
Extensive reworking,
|
|
including this documentation,
|
|
by Henry Spencer (henry@zoo.toronto.edu) as
|
|
part of the C News project.
|
|
Hashing function by Peter Honeyman.
|
|
.SH BUGS
|
|
The
|
|
.I dptr
|
|
members of returned
|
|
.I datum
|
|
values point to static storage which is overwritten by later calls.
|
|
.PP
|
|
Unlike
|
|
.IR dbm ,
|
|
.I dbz
|
|
will misbehave if an existing key-value pair is `overwritten' by
|
|
a new
|
|
.I (dbz)store
|
|
with the same key.
|
|
The user is responsible for avoiding this by using
|
|
.I (dbz)fetch
|
|
first to check for duplicates;
|
|
an internal optimization remembers the result of the
|
|
first search so there is minimal overhead in this.
|
|
.PP
|
|
Waiting until after
|
|
.I dbminit
|
|
to bring the base file into existence
|
|
will fail if
|
|
.IR chdir (2)
|
|
has been used meanwhile.
|
|
.PP
|
|
The RFC822 case mapper implements only a first approximation to the
|
|
hideously-complex RFC822 case rules.
|
|
.PP
|
|
The prime finder in
|
|
.I dbzsize
|
|
is not particularly quick.
|
|
.PP
|
|
Should implement the
|
|
.I dbm
|
|
functions
|
|
.IR delete ,
|
|
.IR firstkey ,
|
|
and
|
|
.IR nextkey .
|
|
.PP
|
|
On C implementations which trap integer overflow,
|
|
.I dbz
|
|
will refuse to
|
|
.I (dbz)store
|
|
an
|
|
.I fseek
|
|
offset equal to the greatest
|
|
representable
|
|
positive number,
|
|
as this would cause overflow in the biased representation used.
|
|
.PP
|
|
.I Dbzagain
|
|
perhaps ought to notice when many offsets
|
|
in the old database were
|
|
too big for
|
|
tagging, and shrink the tag mask to match.
|
|
.PP
|
|
Marking
|
|
.IR dbz 's
|
|
file descriptors
|
|
.RI close-on- exec
|
|
would be a better approach to the problem
|
|
.I dbzcancel
|
|
tries to address, but that's harder to do portably.
|