binutils-gdb/ld/ld.tex

1015 lines
36 KiB
TeX
Raw Normal View History

1991-03-21 21:29:07 +00:00
\input texinfo
@parindent=0pt
@setfilename gld
@c @@setchapternewpage odd
@settitle GLD, The GNU linker
@titlepage
@title{gld}
@subtitle{The gnu loader}
@sp 1
@subtitle Second Edition---gld version 2.0
@subtitle January 1991
@vskip 0pt plus 1filll
Copyright @copyright{} 1991 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
are preserved on all copies.
Permission is granted to copy and distribute modified versions of this
manual under the conditions for verbatim copying, provided also that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.
Permission is granted to copy and distribute translations of this manual
into another language, under the above conditions for modified versions.
@author {Steve Chamberlain}
@author {Cygnus Support}
@author {steve@@cygnus.com}
@end titlepage
@node Top,,,
@comment node-name, next, previous, up
@ifinfo
This file documents the GNU linker gld.
@end ifinfo
@c chapter What does a linker do ?
@c chapter Command Language
@noindent
@chapter Overview
The @code{gld} command combines a number of object and archive files,
relocates their data and ties up symbol references. Often the last
step in building a new compiled program to run is a call to @code{gld}.
The @code{gld} command accepts Linker Command Language files in
a superset of AT+T's Link Editor Command Language syntax,
to provide explict and total control over the linking process.
This version of @code{gld} uses the general purpose @code{bfd} libraries
to operate on object files. This allows @code{gld} to read and
write any of the formats supported by @code{bfd}, different
formats may be linked together producing any available object file.
Supported formats:
@itemize @bullet
@item
Sun3 68k a.out
@item
IEEE-695 68k Object Module Format
@item
Oasys 68k Binary Relocatable Object File Format
@item
Sun4 sparc a.out
@item
88k bcs coff
@item
i960 coff little endian
@item
i960 coff big endian
@item
i960 b.out little endian
@item
i960 b.out big endian
@item
s-records
@end itemize
When linking similar formats, @code{gld} maintains all debugging
information.
@chapter Command line options
@example
gld [ -Bstatic ] [ -D @var{datasize} ]
[ -c @var{filename} ]
[ -d ] | [ -dc ] | [ -dp ]
[ -i ]
[ -e @var{entry} ] [ -l @var{arch} ] [ -L @var{searchdir} ] [ -M ]
[ -N | -n | -z ] [ -noinhibit-exec ] [ -r ] [ -S ] [ -s ]
[ -f @var{fill} ]
[ -T @var{textorg} ] [ -Tdata @var{dataorg} ] [ -t ] [ -u @var{sym}]
[ -X ] [ -x ]
[-o @var{output} ] @var{objfiles}@dots{}
@end example
Command-line options to GNU @code{gld} may be specified in any order, and
may be repeated at will. For the most part, repeating an option with a
different argument will either have no further effect, or override prior
occurrences (those further to the left on the command line) of an
option.
The exceptions which may meaningfully be present several times
are @code{-L}, @code{-l}, and @code{-u}.
@var{objfiles} may follow, precede, or be mixed in with
command-line options; save that an @var{objfiles} argument may not be
placed between an option flag and its argument.
Option arguments must follow the option letter without intervening
whitespace, or be given as separate arguments immediately following the
option that requires them.
@table @code
@item @var{objfiles}@dots{}
The object files @var{objfiles} to be linked; at least one must be specified.
@item -Bstatic
This flag is accepted for command-line compatibility with the SunOS linker,
but has no effect on @code{gld}.
@item -c @var{commandfile}
Directs @code{gld} to read linkage commands from the file @var{commandfile}.
@item -D @var{datasize}
Use this option to specify a target size for the @code{data} segment of
your linked program. The option is only obeyed if @var{datasize} is
larger than the natural size of the program's @code{data} segment.
@var{datasize} must be an integer specified in hexadecimal.
@code{ld} will simply increase the size of the @code{data} segment,
padding the created gap with zeros, and reduce the size of the
@code{bss} segment to match.
@item -d
Force @code{ld} to assign space to common symbols
even if a relocatable output file is specified (@code{-r}).
@item -dc | -dp
This flags is accepted for command-line compatibility with the SunOS linker,
but has no effect on @code{gld}.
@item -e @var{entry}
Use @var{entry} as the explicit symbol for beginning execution of your
program, rather than the default entry point. If this symbol is
not specified, the symbol @code{start} is used as the entry address.
If there is no symbol called @code{start}, then the entry address
is set to the first address in the first output section
(usually the @samp{text} section).
@item -f @var{fill}
Sets the default fill pattern for ``holes'' in the output file to
the lowest two bytes of the expression specified.
@item -i
Produce an incremental link (same as option @code{-r}).
@item -l @var{arch}
Add an archive file @var{arch} to the list of files to link. This
option may be used any number of times. @code{ld} will search its
path-list for occurrences of @code{lib@var{arch}.a} for every @var{arch}
specified.
@c This also has a side effect of using the "c++ demangler" if we happen
@c to specify -llibg++. Document? pesch@@cygnus.com, 24jan91
@item -L @var{searchdir}
This command adds path @var{searchdir} to the
list of paths that @code{gld} will search for archive libraries. You
may use this option any number of times.
@c Should we make any attempt to list the standard paths searched
@c without listing? When hacking on a new system I often want to know
@c this, but this may not be the place... it's not constant across
@c systems, of course, which is what makes it interesting.
@c pesch@@cygnus.com, 24jan91.
@item -M
@itemx -m
Print (to the standard output file) a link map---diagnostic information
about where symbols are mapped by @code{ld}, and information on global
common storage allocation.
@item -N
specifies read and writable @code{text} and @code{data} sections. If
the output format supports Unix style magic numbers, then OMAGIC is set.
@item -n
sets the text segment to be read only, and @code{NMAGIC} is written
if possible.
@item -o @var{output}
@var{output} is a name for the program produced by @code{ld}; if this
option is not specified, the name @samp{a.out} is used by default.
@item -r
Generates relocatable output---i.e., generate an output file that can in
turn serve as input to @code{gld}. As a side effect, this option also
sets the output file's magic number to @code{OMAGIC}; see @samp{-N}. If this
option is not specified, an absolute file is produced.
@item -S
Omits debugger symbol information (but not all symbols) from the output file.
@item -s
Omits all symbol information from the output file.
@item -T @var{textorg}
@itemx -Ttext @var{textorg}
Use @var{textorg} as the starting address for the @code{text} segment of the
output file. Both forms of this option are equivalent. The option
argument must be a hexadecimal integer.
@item -Tdata @var{dataorg}
Use @var{dataorg} as the starting address for the @code{data} segment of
the output file. The option argument must be a hexadecimal integer.
@item -t
Prints names of input files as @code{ld} processes them.
@item -u @var{sym}
Forces @var{sym} to be entered in the output file as an undefined symbol.
This may, for example, trigger linking of additional modules from
standard libraries. @code{-u} may be repeated with different option
arguments to enter additional undefined symbols. This option is equivalent
to the @code{EXTERN} linker command.
@item -X
If @code{-s} or @code{-S} is also specified, delete only local symbols
beginning with @samp{L}.
@item -z
@code{-z} sets @code{ZMAGIC}, the default: the @code{text} segment is
read-only, demand pageable, and shared.
Specifying a relocatable output file (@code{-r}) will also set the magic
number to @code{OMAGIC}.
See description of @samp{-N}.
@end table
@chapter Command Language
The command language allows explicit control over the linkage process, allowing
specification of:
@table @bullet
@item input files
@item file formats
@item output file format
@item addresses of sections
@item placement of common blocks
@item and more
@end table
A command file may be supplied to the linker, either explicitly through the
@code{-c} option, or implicitly as an ordinary file. If the linker opens
a file which does not have a reasonable object or archive format, it tries
to read the file as if it were a command file.
@section Structure
To be added
@section Expressions
The syntax for expressions in the command language is identical to that of
C expressions, with the following features:
@table @bullet
@item All expressions evaluated as integers and
are of ``long'' or ``unsigned long'' type.
@item All constants are integers.
@item All of the C arithmetic operators are provided.
@item Global variables may be referenced, defined and created.
@item Build in functions may be called.
@end table
@section Expressions
The linker has a practice of ``lazy evaluation'' for expressions; it only
calculates an expression when absolutely necessary. For instance,
when the linker reads in the command file it has to know the values
of the start address and the length of the memory regions for linkage to continue, so these
values are worked out, but other values (such as symbol values) are not
known or needed until after storage allocation.
They are evaluated later, when the other
information, such as the sizes of output sections are available for use in
the symbol assignment expression.
When a linker expression is evaluated and assigned to a variable it is given
either an absolute or a relocatable type. An absolute expression type
is one in which the symbol contains the value that it will have in the
output file, a relocateable expression type is one in which the value
is expressed as a fixed offset from the base of a section.
The type of the expression is controlled by its position in the script
file. A symbol assigned within a @code{SECTION} specification is
created relative to the base of the section, a symbol assigned in any
other place is created as an absolute symbol. Since a symbol created
within a @code{SECTION} specification is relative to the base of the
section it will remain relocatable if relocatable output is requested.
A symbol may be created with an absolute value even when assigned to
within a @code{SECTION} specification by using the absolute assignment
function @code{ABSOLUTE} For example, to create an absolute symbol
whose address is the last byte of the output section @code{.data}:
@example
.data :
@{
*(.data)
_edata = ABSOLUTE(.) ;
@}
@end example
Unless quoted, symbol names start with a letter, underscore, point or
minus sign and may include any letters, underscores, digits, points,
and minus signs. Unquoted symbol names must not conflict with any
keywords. To specify a symbol which contains odd characters or has
the same name as a keyword surround it in double quotes:
@example
``SECTION'' = 9;
``with a space'' = ``also with a space'' + 10;
@end example
@subsection Integers
An octal integer is @samp{0} followed by zero or more of the octal
digits (@samp{01234567}).
A decimal integer starts with a non-zero digit followed by zero or
more digits (@samp{0123456789}).
A hexadecimal integer is @samp{0x} or @samp{0X} followed by one or
more hexadecimal digits chosen from @samp{0123456789abcdefABCDEF}.
Integers have the usual values. To denote a negative integer, use
the unary operator @samp{-} discussed under expressions.
Additionally the suffixes @code{K} and @code{M} may be used to multiply the
previous constant by 1024 or
@tex
$1024^2$
@end tex
respectively.
@example
_as_decimal = 57005;
_as_hex = 0xdead;
_as_octal = 0157255;
_4k_1 = 4K;
_4k_2 = 4096;
_4k_3 = 0x1000;
@end example
@subsection Operators
The linker provides the standard C set of arithmetic operators, with
the standard bindings and precedence levels:
@example
@end example
@tex
\vbox{\offinterlineskip
\hrule
\halign
{\vrule#&\hfil#\hfil&\vrule#&\hfil#\hfil&\vrule#&\hfil#\hfil&\vrule#\cr
height2pt&&&&&\cr
&Level&& associativity &&Operators&\cr
height2pt&&&&&\cr
\noalign{\hrule}
height2pt&&&&&\cr
&highest&&&&&&\cr
&1&&left&&$ ! - ~$&\cr
height2pt&&&&&\cr
&2&&left&&* / \%&\cr
height2pt&&&&&\cr
&3&&left&&+ -&\cr
height2pt&&&&&\cr
&4&&left&&$>> <<$&\cr
height2pt&&&&&\cr
&5&&left&&$== != > < <= >=$&\cr
height2pt&&&&&\cr
&6&&left&&\&&\cr
height2pt&&&&&\cr
&7&&left&&|&\cr
height2pt&&&&&\cr
&8&&left&&{\&\&}&\cr
height2pt&&&&&\cr
&9&&left&&||&\cr
height2pt&&&&&\cr
&10&&right&&? :&\cr
height2pt&&&&&\cr
&11&&right&&$${\&= += -= *= /=}&\cr
&lowest&&&&&&\cr
height2pt&&&&&\cr}
\hrule}
@end tex
@section Built in Functions
The command language provides built in functions for use in
expressions in linkage scripts.
@table @bullet
@item @code{ALIGN(@var{exp})}
returns the result of the current location counter (@code{dot})
aligned to the next @var{exp} boundary, where @var{exp} is a power of
two. This is equivalent to @code{(. + @var{exp} -1) & ~(@var{exp}-1)}.
As an example, to align the output @code{.data} section to the
next 0x2000 byte boundary after the preceding section and to set a
variable within the section to the next 0x8000 boundary after the
input sections:
@example
.data ALIGN(0x2000) :@{
*(.data)
variable = ALIGN(0x8000);
@}
@end example
@item @code{ADDR(@var{section name})}
returns the absolute address of the named section if the section has
already been bound. In the following examples the @code{symbol_1} and
@code{symbol_2} are assigned identical values:
@example
.output1:
@{
start_of_output_1 $= .;
...
@}
.output:
@{
symbol_1 = ADDR(.output1);
symbol_2 = start_of_output_1;
@}
@end example
@item @code{SIZEOF(@var{section name})}
returns the size in bytes of the named section, if the section has
been allocated. In the following example the @code{symbol_1} and
@code{symbol_2} are assigned identical values:
@example
.output @{
.start = . ;
...
.end = .;
@}
symbol_1 = .end - .start;
symbol_2 = SIZEOF(.output);
@end example
@item @code{DEFINED(@var{symbol name})}
Returns 1 if the symbol is in the linker global symbol table and is
defined, otherwise it returns 0. This example shows the setting of a
global symbol @code{begin} to the first location in the @code{.text}
section, only if there is no other symbol
called @code{begin} already:
@example
.text: @{
begin = DEFINED(begin) ? begin : . ;
...
@}
@end example
@end table
@page
@section MEMORY Directive
The linker's default configuration is for all memory to be
allocatable. This state may be overridden by using the @code{MEMORY}
directive. The @code{MEMORY} directive describes the location and
size of blocks of memory in the target. Careful use can describe
memory regions which may or may not be used by the linker. The linker
does not shuffle sections to fit into the available regions, but does
move the requested sections into the correct regions and issue errors
when the regions become too full. The syntax is:
@example
MEMORY
@{
@tex
$\bigl\lbrace {\it name_1} ({\it attr_1}):$ ORIGIN = ${\it origin_1},$ LENGTH $= {\it len_1} \bigr\rbrace $
@end tex
@}
@end example
@table @code
@item @var{name}
is a name used internally by the linker to refer to the region. Any
symbol name may be used. The region names are stored in a separate
name space, and will not conflict with symbols, filenames or section
names.
@item @var{attr}
is an optional list of attributes, parsed for compatibility with the
AT+T linker
but ignored by the both the AT+T and the gnu linker.
@item @var{origin}
is the start address of the region in physical memory expressed as
standard linker expression which must evaluate to a constant before
memory allocation is performed. The keyword @code{ORIGIN} may be
abbreviated to @code{org} or @code{o}.
@item @var{len}
is the size in bytes of the region as a standard linker expression.
The keyword @code{LENGTH} may be abbreviated to @code{len} or @code{l}
@end table
For example, to specify that memory has two regions available for
allocation; one starting at 0 for 256k, and the other starting at
0x40000000 for four megabytes:
@example
MEMORY
@{
rom : ORIGIN= 0, LENGTH = 256K
ram : ORIGIN= 0x40000000, LENGTH = 4M
@}
@end example
If the combined output sections directed to a region are too big for
the region the linker will emit an error message.
@page
@section SECTIONS Directive
The @code{SECTIONS} directive
controls exactly where input sections are placed into output sections, their
order and to which output sections they are allocated.
When no @code{SECTIONS} directives are specified, the default action
of the linker is to place each input section into an identically named
output section in the order that the sections appear in the first
file, and then the order of the files.
The syntax of the @code{SECTIONS} directive is:
@example
SECTIONS
@{
@tex
$\bigl\lbrace {\it name_n}\bigl[options\bigr]\colon$ $\bigl\lbrace {\it statements_n} \bigr\rbrace \bigl[ = {\it fill expression } \bigr] \bigl[ > mem spec \bigr] \bigr\rbrace $
@end tex
@}
@end example
@table @code
@item @var{name}
controls the name of the output section. In formats which only support
a limited number of sections, such as @code{a.out}, the name must be
one of the names supported by the format (in the case of a.out,
@code{.text}, @code{.data} or @code{.bss}). If the output format
supports any number of sections, but with numbers and not names (in
the case of IEEE), the name should be supplied as a quoted numeric
string. A section name may consist of any sequence characters, but
any name which does not conform to the standard @code{gld} symbol name
syntax must be quoted. To copy sections 1 through 4 from a Oasys file
into the @code{.text} section of an @code{a.out} file, and sections 13
and 14 into the @code{data} section:
@example
SECTION @{
.text :@{
*(``1'' ``2'' ``3'' ``4'')
@}
.data :@{
*(``13'' ``14'')
@}
@}
@end example
@item @var{fill expression}
If present this
expression sets the fill value. Any unallocated holes in the current output
section when written to the output file will
be filled with the two least significant bytes of the value, repeated as
necessary.
@page
@item @var{options}
the @var{options} parameter is a list of optional arguments specifying
attributes of the output section, they may be taken from the following
list:
@table @bullet{}
@item @var{addr expression}
forces the output section to be loaded at a specified address. The
address is specified as a standard linker expression. The following
example generates section @var{output} at location
@code{0x40000000}:
@example
SECTIONS @{
output 0x40000000: @{
...
@}
@}
@end example
Since the built in function @code{ALIGN} references the location
counter implicitly, a section may be located on a certain boundary by
using the @code{ALIGN} function in the expression. For example, to
locate the @code{.data} section on the next 8k boundary after the end
of the @code{.text} section:
@example
SECTIONS @{
.text @{
...
@}
.data ALIGN(4K) @{
...
@}
@}
@end example
@end table
@item @var{statements}
is a list of file names, input sections and assignments. These statements control what is placed into the
output section.
The syntax of a single @var{statement} is one of:
@table @bullet
@item @var{symbol} [ $= | += | -= | *= | /= ] @var{ expression} @code{;}
Global symbols may be created and have their values (addresses)
altered using the assignment statement. The linker tries to put off
the evaluation of an assignment until all the terms in the source
expression are known; for instance the sizes of sections cannot be
known until after allocation, so assignments dependent upon these are
not performed until after allocation. Some expressions, such as those
depending upon the location counter @code{dot}, @samp{.} must be
evaluated during allocation. If the result of an expression is
required, but the value is not available, then an error results: eg
@example
SECTIONS @{
text 9+this_isnt_constant:
@{
@}
@}
testscript:21: Non constant expression for initial address
@end example
@item @code{CREATE_OBJECT_SYMBOLS}
causes the linker to create a symbol for each input file and place it
into the specified section set with the value of the first byte of
data written from the input file. For instance, with @code{a.out}
files it is conventional to have a symbol for each input file.
@example
SECTIONS @{
.text 0x2020 :
@{
CREATE_OBJECT_SYMBOLS
*(.text)
_etext = ALIGN(0x2000);
@}
@}
@end example
Supplied with four object files, @code{a.o}, @code{b.o}, @code{c.o},
and @code{d.o} a run of
@code{gld} could create a map:
@example
From functions like :
a.c:
afunction() { }
int adata=1;
int abss;
00000000 A __DYNAMIC
00004020 B _abss
00004000 D _adata
00002020 T _afunction
00004024 B _bbss
00004008 D _bdata
00002038 T _bfunction
00004028 B _cbss
00004010 D _cdata
00002050 T _cfunction
0000402c B _dbss
00004018 D _ddata
00002068 T _dfunction
00004020 D _edata
00004030 B _end
00004000 T _etext
00002020 t a.o
00002038 t b.o
00002050 t c.o
00002068 t d.o
@end example
@item @var{filename} @code{(} @var{section name list} @code{)}
This command allocates all the named sections from the input object
file supplied into the output section at the current point. Sections
are written in the order they appear in the list so:
@example
SECTIONS @{
.text 0x2020 :
@{
a.o(.data)
b.o(.data)
*(.text)
@}
.data :
@{
*(.data)
@}
.bss :
@{
*(.bss)
COMMON
@}
@}
@end example
will produce a map:
@example
insert here
@end example
@item @code{* (} @var{section name list} @code{)}
This command causes all sections from all input files which have not
yet been assigned output sections to be assigned the current output
section.
@item @var{filename} @code{[COMMON]}
This allocates all the common symbols from the specified file and places
them into the current output section.
@item @code{* [COMMON]}
This allocates all the common symbols from the files which have not
yet had their common symbols allocated and places them into the current
output section.
@item @var{filename}
A filename alone within a @code{SECTIONS} statement will cause all the
input sections from the file to be placed into the current output
section at the current location. If the file name has been mentioned
before with a section name list then only those
sections which have not yet been allocated are noted.
The following example reads all of the sections from file all.o and
places them at the start of output section @code{outputa} which starts
at location @code{0x10000}. All of the data from section @code{.input1} from
file foo.o is placed next into the same output section. All of
section @code{.input2} is read from foo.o and placed into output
section @code{outputb}. Next all of section @code{.input1} is read
from foo1.o. All of the remaining @code{.input1} and @code{.input2}
sections from any files are written to output section @code{output3}.
@example
SECTIONS
@{
outputa 0x10000 :
@{
all.o
foo.o (.input1)
@}
outputb :
@{
foo.o (.input2)
foo1.o (.input1)
@}
outputc :
@{
*(.input1)
*(.input2)
@}
@}
@end example
@end table
@end table
@section Using the Location Counter
The special linker variable @code{dot}, @samp{.} always contains the
current output location counter. Since the @code{dot} always refers to
a location in an output section, it must always appear in an
expression within a @code{SECTIONS} directive. The @code{dot} symbol
may appear anywhere that an ordinary symbol may appear in an
expression, but its assignments have a side effect. Assigning a value
to the @code{dot} symbol will cause the location counter to be moved.
This may be used to create holes in the output section. The location
counter may never be moved backwards.
@example
SECTIONS
@{
output :
@{
file1(.text)
. = . + 1000;
file2(.text)
. += 1000;
file3(.text)
. -= 32;
file4(.text)
@} = 0x1234;
@}
@end example
In the previous example, @code{file1} is located at the beginning of
the output section, then there is a 1000 byte gap, filled with 0x1234.
Then @code{file2} appears, also with a 1000 byte gap following before
@code{file3} is loaded. Then the first 32 bytes of @code{file4} are
placed over the last 32 bytes of @code{file3}.
@section Command Language Syntax
@section The Entry Point
The linker chooses the first executable instruction in an output file from a list
of possibilities, in order:
@itemize @bullet
@item
The value of the symbol provided to the command line with the @code{-e} option, when
present.
@item
The value of the symbol provided in the @code{ENTRY} directive,
if present.
@item
The value of the symbol @code{start}, if present.
@item
The value of the symbol @code{_main}, if present.
@item
The address of the first byte of the @code{.text} section, if present.
@item
The value 0.
@end itemize
If the symbol @code{start} is not defined within the set of input
files to a link, it may be generated by a simple assignment
expression. eg.
@example
start = 0x2020;
@end example
@section Section Attributes
@section Allocation of Sections into Memory
@section Defining Symbols
@chapter Examples of operation
The simplest case is linking standard Unix object files on a standard
Unix system supported by the linker. To link a file hello.o:
@example
$ gld -o output /lib/crt0.o hello.o -lc
@end example
This tells gld to produce a file called @code{output} after linking
the file @code{/lib/crt0.o} with @code{hello.o} and the library
@code{libc.a} which will come from the standard search directories.
@chapter Partial Linking
Specifying the @code{-r} on the command line causes @code{gld} to
perform a partial link.
@chapter BFD
The linker accesses object and archive files using the @code{bfd}
libraries. These libraries allow the linker to use the same routines
to operate on object files whatever the object file format.
A different object file format can be supported simply by creating a
new @code{bfd} back end and adding it to the library.
Formats currently supported:
@itemize @bullet
@item
Sun3 68k a.out
@item
IEEE-695 68k Object Module Format
@item
Oasys 68k Binary Relocatable Object File Format
@item
Sun4 sparc a.out
@item
88k bcs coff
@item
i960 coff little endian
@item
i960 coff big endian
@item
i960 b.out little endian
@item
i960 b.out big endian
@end itemize
As with most implementations, @code{bfd} is a compromise between
several conflicting requirements. The major factor influencing
@code{bfd} design was efficiency, any time used converting between
formats is time which would not have been spent had @code{bfd} not
been involved. This is partly offset by abstraction payback; since
@code{bfd} simplifies applications and back ends, more time and care
may be spent optimizing algorithms for a greater speed.
One minor artifact of the @code{bfd} solution which the
user should be aware of is information lossage.
There are two places where useful information can be lost using the
@code{bfd} mechanism; during conversion and during output.
@section How it works
When an object file is opened, @code{bfd}
tries to automatically determine the format of the input object file, a
descriptor is built in memory with pointers to routines to access
elements of the object file's data structures.
As different information from the the object files is required
@code{bfd} reads from different sections of the file and processes
them. For example a very common operation for the linker is processing
symbol tables. Each @code{bfd} back end provides a routine for
converting between the object file's representation of symbols and an
internal canonical format. When the linker asks for the symbol table
of an object file, it calls through the memory pointer to the relevant
@code{bfd} back end routine which reads and converts the table into
the canonical form. Linker then operates upon the common form. When
the link is finished and the linker writes the symbol table of the
output file, another @code{bfd} back end routine is called which takes
the newly created symbol table and converts it into the output format.
@section Information Leaks
@table @bullet{}
@item Information lost during output.
The output formats supported by @code{bfd} do not provide identical
facilities, and information which may be described in one form
has no where to go in another format. One example of this would be
alignment information in @code{b.out}. There is no where in an @code{a.out}
format file to store alignment information on the contained data, so when
a file is linked from @code{b.out} and an @code{a.out} image is produced,
alignment information is lost. (Note that in this case the linker has the
alignment information internally, so the link is performed correctly).
Another example is COFF section names. COFF files may contain an
unlimited number of sections, each one with a textual section name. If
the target of the link is a format which does not have many sections
(eg @code{a.out}) or has sections without names (eg the Oasys format)
the link cannot be done simply. It is possible to circumvent this
problem by describing the desired input section to output section
mapping with the command language.
@item Information lost during canonicalization.
The @code{bfd}
internal canonical form of the external formats is not exhaustive,
there are structures in input formats for which there is no direct
representation internally. This means that the @code{bfd} back ends
cannot maintain all the data richness through the transformation
between external to internal and back to external formats.
This limitation is only a problem when using the linker to read one
format and write another. Each @code{bfd} back end is responsible for
maintaining as much data as possible, and the internal @code{bfd}
canonical form has structures which are opaque to the @code{bfd} core,
and exported only to the back ends. When a file is read in one format,
the canonical form is generated for @code{bfd} and the linker. At the
same time, the back end saves away any information which may otherwise
be lost. If the data is then written back to the same back end, the
back end routine will be able to use the canonical form provided by
the @code{bfd} core as well as the information it prepared earlier.
Since there is a great deal of commonality between back ends, this
mechanism is very useful. There is no information lost when linking
big endian COFF to little endian COFF, or from a.out to b.out. When a
mixture of formats are linked, the information is only lost from the
files with a different format to the destination.
@end table
@section Mechanism
The smallest amount of information is preserved when there
is a small union between the information provided by the source
format, that stored by the canonical format and the information needed
by the destination format. A brief description of the canonical form
will help the user appreciate what is possible to be maintained
between conversions.
@table @bullet
@item file level Information on target machine
architecture, particular implementation and format type are stored on
a per file basis. Other information includes a demand pageable bit and
a write protected bit. Note that information like Unix magic numbers
is not stored here, only the magic numbers meaning, so a ZMAGIC file
would have both the demand pageable bit and the write protected text
bit set.
The byte order of the target is stored on a per file basis, so that
both big and little endian object files may be linked together at the
same time.
@item section level
Each section in the input file contains the name of the section, the
original address in the object file, various flags, size and alignment
information and pointers into other @code{bfd} data structures.
@item symbol level
Each symbol contains a pointer to the object file which originally
defined it, its name, value and various flags bits. When a symbol
table is read in all symbols are relocated to make them relative to
the base of the section they were defined in, so each symbol points to
the containing section. Each symbol also has a varying amount of
hidden data to contain private data for the back end. Since the symbol
points to the original file, the symbol private data format is
accessible. Operations may be done to a list of symbols of wildly
different formats without problems.
Normal global and simple local symbols are maintained on output, so an
output file, no matter the format will retain symbols pointing to
functions, globals, statics and commons. Some symbol information is
not worth retaining; in @code{a.out} type information is stored in the
symbol table as long symbol names. This information would be useless
to most coff debuggers and may be thrown away with appropriate command
line switches. (Note that gdb does support stabs in coff).
There is one word of type information within the symbol, so if the
format supports symbol type information within symbols - (eg COFF,
IEEE, Oasys) and the type is simple enough to fit within one word
(nearly everything but aggregates) the information will be preserved.
@item relocation level
Each canonical relocation record contains a pointer to the symbol to
relocate to, the offset of the data to relocate, the section the data
is in and a pointer to a relocation type descriptor. Relocation is
performed effectively by message passing through the relocation type
descriptor and symbol pointer. It allows relocations to be performed
on output data using a relocation method only available in one of the
input formats. For instance, Oasys provides a byte relocation format.
A relocation record requesting this relocation type would point
indirectly to a routine to perform this, so the relocation may be
performed on a byte being written to a COFF file, even though 68k COFF
has no such relocation type.
@item line numbers
Line numbers have to be relocated along with the symbol information.
Each symbol with an associated list of line number records points to
the first record of the list. The head of a line number list consists
of a pointer to the symbol, which allows divination of the address of
the function who's line number is being described. The rest of the
list is tuples offsets into the section and line indexes. Any format
which can simply derive this information can pass it without lossage
between formats (COFF, IEEE and Oasys).
@end table
@bye