Commit Graph

5 Commits

Author SHA1 Message Date
Alan Modra b3adc24a07 Update year range in copyright notice of binutils files 2020-01-01 18:42:54 +10:30
Nick Alcock de07e349be libctf: remove ctf_malloc, ctf_free and ctf_strdup
These just get in the way of auditing for erroneous usage of strdup and
add a huge irregular surface of "ctf_malloc or malloc? ctf_free or free?
ctf_strdup or strdup?"

ctf_malloc and ctf_free usage has not reliably matched up for many
years, if ever, making the whole game pointless.

Go back to malloc, free, and strdup like everyone else: while we're at
it, fix a bunch of places where we weren't properly checking for OOM.
This changes the interface of ctf_cuname_set and ctf_parent_name_set,
which could strdup but could not return errors (like ENOMEM).

New in v4.

include/
	* ctf-api.h (ctf_cuname_set): Can now fail, returning int.
	(ctf_parent_name_set): Likewise.
libctf/
	* ctf-impl.h (ctf_alloc): Remove.
	(ctf_free): Likewise.
	(ctf_strdup): Likewise.
	* ctf-subr.c (ctf_alloc): Remove.
	(ctf_free): Likewise.
	* ctf-util.c (ctf_strdup): Remove.

	* ctf-create.c (ctf_serialize): Use malloc, not ctf_alloc; free, not
	ctf_free; strdup, not ctf_strdup.
	(ctf_dtd_delete): Likewise.
	(ctf_dvd_delete): Likewise.
	(ctf_add_generic): Likewise.
	(ctf_add_function): Likewise.
	(ctf_add_enumerator): Likewise.
	(ctf_add_member_offset): Likewise.
	(ctf_add_variable): Likewise.
	(membadd): Likewise.
	(ctf_compress_write): Likewise.
	(ctf_write_mem): Likewise.
	* ctf-decl.c (ctf_decl_push): Likewise.
	(ctf_decl_fini): Likewise.
	(ctf_decl_sprintf): Likewise.  Check for OOM.
	* ctf-dump.c (ctf_dump_append): Use malloc, not ctf_alloc; free, not
	ctf_free; strdup, not ctf_strdup.
	(ctf_dump_free): Likewise.
	(ctf_dump): Likewise.
	* ctf-open.c (upgrade_types_v1): Likewise.
	(init_types): Likewise.
	(ctf_file_close): Likewise.
	(ctf_bufopen_internal): Likewise.  Check for OOM.
	(ctf_parent_name_set): Likewise: report the OOM to the caller.
	(ctf_cuname_set): Likewise.
	(ctf_import): Likewise.
	* ctf-string.c (ctf_str_purge_atom_refs): Use malloc, not ctf_alloc;
	free, not ctf_free; strdup, not ctf_strdup.
	(ctf_str_free_atom): Likewise.
	(ctf_str_create_atoms): Likewise.
	(ctf_str_add_ref_internal): Likewise.
	(ctf_str_remove_ref): Likewise.
	(ctf_str_write_strtab): Likewise.
2019-10-03 17:04:56 +01:00
Nick Alcock 676c3ecbad libctf: avoid the need to ever use ctf_update
The method of operation of libctf when the dictionary is writable has
before now been that types that are added land in the dynamic type
section, which is a linked list and hash of IDs -> dynamic type
definitions (and, recently a hash of names): the DTDs are a bit of CTF
representing the ctf_type_t and ad hoc C structures representing the
vlen.  Historically, libctf was unable to do anything with these types,
not even look them up by ID, let alone by name: if you wanted to do that
say if you were adding a type that depended on one you just added) you
called ctf_update, which serializes all the DTDs into a CTF file and
reopens it, copying its guts over the fp it's called with.  The
ctf_updated types are then frozen in amber and unchangeable: all lookups
will return the types in the static portion in preference to the dynamic
portion, and we will refuse to re-add things that already exist in the
static portion (and, of late, in the dynamic portion too).  The libctf
machinery remembers the boundary between static and dynamic types and
looks in the right portion for each type.  Lots of things still don't
quite work with dynamic types (e.g. getting their size), but enough
works to do a bunch of additions and then a ctf_update, most of the
time.

Except it doesn't, because ctf_add_type finds it necessary to walk the
full dynamic type definition list looking for types with matching names,
so it gets slower and slower with every type you add: fixing this
requires calling ctf_update periodically for no other reason than to
avoid massively slowing things down.

This is all clunky and very slow but kind of works, until you consider
that it is in fact possible and indeed necessary to modify one sort of
type after it has been added: forwards.  These are necessarily promoted
to structs, unions or enums, and when they do so *their type ID does not
change*.  So all of a sudden we are changing types that already exist in
the static portion.  ctf_update gets massively confused by this and
allocates space enough for the forward (with no members), but then emits
the new dynamic type (with all the members) into it.  You get an
assertion failure after that, if you're lucky, or a coredump.

So this commit rejigs things a bit and arranges to exclusively use the
dynamic type definitions in writable dictionaries, and the static type
definitions in readable dictionaries: we don't at any time have a mixture
of static and dynamic types, and you don't need to call ctf_update to
make things "appear".  The ctf_dtbyname hash I introduced a few months
ago, which maps things like "struct foo" to DTDs, is removed, replaced
instead by a change of type of the four dictionaries which track names.
Rather than just being (unresizable) ctf_hash_t's populated only at
ctf_bufopen time, they are now a ctf_names_t structure, which is a pair
of ctf_hash_t and ctf_dynhash_t, with the ctf_hash_t portion being used
in readonly dictionaries, and the ctf_dynhash_t being used in writable
ones.  The decision as to which to use is centralized in the new
functions ctf_lookup_by_rawname (which takes a type kind) and
ctf_lookup_by_rawhash, which it calls (which takes a ctf_names_t *.)

This change lets us switch from using static to dynamic name hashes on
the fly across the entirety of libctf without complexifying anything: in
fact, because we now centralize the knowledge about how to map from type
kind to name hash, it actually simplifies things and lets us throw out
quite a lot of now-unnecessary complexity, from ctf_dtnyname (replaced
by the dynamic half of the name tables), through to ctf_dtnextid (now
that a dictionary's static portion is never referenced if the dictionary
is writable, we can just use ctf_typemax to indicate the maximum type:
dynamic or non-dynamic does not matter, and we no longer need to track
the boundary between the types).  You can now ctf_rollback() as far as
you like, even past a ctf_update or for that matter a full writeout; all
the iteration functions work just as well on writable as on read-only
dictionaries; ctf_add_type no longer needs expensive duplicated code to
run over the dynamic types hunting for ones it might be interested in;
and the linker no longer needs a hack to call ctf_update so that calling
ctf_add_type is not impossibly expensive.

There is still a bit more complexity: some new code paths in ctf-types.c
need to know how to extract information from dynamic types.  This
complexity will go away again in a few months when libctf acquires a
proper intermediate representation.

You can still call ctf_update if you like (it's public API, after all),
but its only effect now is to set the point to which ctf_discard rolls
back.

Obviously *something* still needs to serialize the CTF file before
writeout, and this job is done by ctf_serialize, which does everything
ctf_update used to except set the counter used by ctf_discard.  It is
automatically called by the various functions that do CTF writeout:
nobody else ever needs to call it.

With this in place, forwards that are promoted to non-forwards no longer
crash the link, even if it happens tens of thousands of types later.

v5: fix tabdamage.

libctf/
	* ctf-impl.h (ctf_names_t): New.
	(ctf_lookup_t) <ctf_hash>: Now a ctf_names_t, not a ctf_hash_t.
	(ctf_file_t) <ctf_structs>: Likewise.
	<ctf_unions>: Likewise.
	<ctf_enums>: Likewise.
	<ctf_names>: Likewise.
	<ctf_lookups>: Improve comment.
	<ctf_ptrtab_len>: New.
	<ctf_prov_strtab>: New.
	<ctf_str_prov_offset>: New.
	<ctf_dtbyname>: Remove, redundant to the names hashes.
	<ctf_dtnextid>: Remove, redundant to ctf_typemax.
	(ctf_dtdef_t) <dtd_name>: Remove.
	<dtd_data>: Note that the ctt_name is now populated.
	(ctf_str_atom_t) <csa_offset>: This is now the strtab
	offset for internal strings too.
	<csa_external_offset>: New, the external strtab offset.
	(CTF_INDEX_TO_TYPEPTR): Handle the LCTF_RDWR case.
	(ctf_name_table): New declaration.
	(ctf_lookup_by_rawname): Likewise.
	(ctf_lookup_by_rawhash): Likewise.
	(ctf_set_ctl_hashes): Likewise.
	(ctf_serialize): Likewise.
	(ctf_dtd_insert): Adjust.
	(ctf_simple_open_internal): Likewise.
	(ctf_bufopen_internal): Likewise.
	(ctf_list_empty_p): Likewise.
	(ctf_str_remove_ref): Likewise.
	(ctf_str_add): Returns uint32_t now.
	(ctf_str_add_ref): Likewise.
	(ctf_str_add_external): Now returns a boolean (int).
	* ctf-string.c (ctf_strraw_explicit): Check the ctf_prov_strtab
	for strings in the appropriate range.
	(ctf_str_create_atoms): Create the ctf_prov_strtab.  Detect OOM
	when adding the null string to the new strtab.
	(ctf_str_free_atoms): Destroy the ctf_prov_strtab.
	(ctf_str_add_ref_internal): Add make_provisional argument.  If
	make_provisional, populate the offset and fill in the
	ctf_prov_strtab accordingly.
	(ctf_str_add): Return the offset, not the string.
	(ctf_str_add_ref): Likewise.
	(ctf_str_add_external): Return a success integer.
	(ctf_str_remove_ref): New, remove a single ref.
	(ctf_str_count_strtab): Do not count the initial null string's
	length or the existence or length of any unreferenced internal
	atoms.
	(ctf_str_populate_sorttab): Skip atoms with no refs.
	(ctf_str_write_strtab): Populate the nullstr earlier.  Add one
	to the cts_len for the null string, since it is no longer done
	in ctf_str_count_strtab.  Adjust for csa_external_offset rename.
	Populate the csa_offset for both internal and external cases.
	Flush the ctf_prov_strtab afterwards, and reset the
	ctf_str_prov_offset.
	* ctf-create.c (ctf_grow_ptrtab): New.
	(ctf_create): Call it.	Initialize new fields rather than old
	ones.  Tell ctf_bufopen_internal that this is a writable dictionary.
	Set the ctl hashes and data model.
	(ctf_update): Rename to...
	(ctf_serialize): ... this.  Leave a compatibility function behind.
	Tell ctf_simple_open_internal that this is a writable dictionary.
	Pass the new fields along from the old dictionary.  Drop
	ctf_dtnextid and ctf_dtbyname.	Use ctf_strraw, not dtd_name.
	Do not zero out the DTD's ctt_name.
	(ctf_prefixed_name): Rename to...
	(ctf_name_table): ... this.  No longer return a prefixed name: return
	the applicable name table instead.
	(ctf_dtd_insert): Use it, and use the right name table.	 Pass in the
	kind we're adding.  Migrate away from dtd_name.
	(ctf_dtd_delete): Adjust similarly.  Remove the ref to the
	deleted ctt_name.
	(ctf_dtd_lookup_type_by_name): Remove.
	(ctf_dynamic_type): Always return NULL on read-only dictionaries.
	No longer check ctf_dtnextid: check ctf_typemax instead.
	(ctf_snapshot): No longer use ctf_dtnextid: use ctf_typemax instead.
	(ctf_rollback): Likewise.  No longer fail with ECTF_OVERROLLBACK. Use
	ctf_name_table and the right name table, and migrate away from
	dtd_name as in ctf_dtd_delete.
	(ctf_add_generic): Pass in the kind explicitly and pass it to
	ctf_dtd_insert. Use ctf_typemax, not ctf_dtnextid.  Migrate away
	from dtd_name to using ctf_str_add_ref to populate the ctt_name.
	Grow the ptrtab if needed.
	(ctf_add_encoded): Pass in the kind.
	(ctf_add_slice): Likewise.
	(ctf_add_array): Likewise.
	(ctf_add_function): Likewise.
	(ctf_add_typedef): Likewise.
	(ctf_add_reftype): Likewise. Initialize the ctf_ptrtab, checking
	ctt_name rather than dtd_name.
	(ctf_add_struct_sized): Pass in the kind.  Use
	ctf_lookup_by_rawname, not ctf_hash_lookup_type /
	ctf_dtd_lookup_type_by_name.
	(ctf_add_union_sized): Likewise.
	(ctf_add_enum): Likewise.
	(ctf_add_enum_encoded): Likewise.
	(ctf_add_forward): Likewise.
	(ctf_add_type): Likewise.
	(ctf_compress_write): Call ctf_serialize: adjust for ctf_size not
	being initialized until after the call.
	(ctf_write_mem): Likewise.
	(ctf_write): Likewise.
	* ctf-archive.c (arc_write_one_ctf): Likewise.
	* ctf-lookup.c (ctf_lookup_by_name): Use ctf_lookuup_by_rawhash, not
	ctf_hash_lookup_type.
	(ctf_lookup_by_id): No longer check the readonly types if the
	dictionary is writable.
	* ctf-open.c (init_types): Assert that this dictionary is not
	writable.  Adjust to use the new name hashes, ctf_name_table,
	and ctf_ptrtab_len.  GNU style fix for the final ptrtab scan.
	(ctf_bufopen_internal): New 'writable' parameter.  Flip on LCTF_RDWR
	if set.	 Drop out early when dictionary is writable.  Split the
	ctf_lookups initialization into...
	(ctf_set_cth_hashes): ... this new function.
	(ctf_simple_open_internal): Adjust.  New 'writable' parameter.
	(ctf_simple_open): Adjust accordingly.
	(ctf_bufopen): Likewise.
	(ctf_file_close): Destroy the appropriate name hashes.	No longer
	destroy ctf_dtbyname, which is gone.
	(ctf_getdatasect): Remove spurious "extern".
	* ctf-types.c (ctf_lookup_by_rawname): New, look up types in the
	specified name table, given a kind.
	(ctf_lookup_by_rawhash): Likewise, given a ctf_names_t *.
	(ctf_member_iter): Add support for iterating over the
	dynamic type list.
	(ctf_enum_iter): Likewise.
	(ctf_variable_iter): Likewise.
	(ctf_type_rvisit): Likewise.
	(ctf_member_info): Add support for types in the dynamic type list.
	(ctf_enum_name): Likewise.
	(ctf_enum_value): Likewise.
	(ctf_func_type_info): Likewise.
	(ctf_func_type_args): Likewise.
	* ctf-link.c (ctf_accumulate_archive_names): No longer call
	ctf_update.
	(ctf_link_write): Likewise.
	(ctf_link_intern_extern_string): Adjust for new
	ctf_str_add_external return value.
	(ctf_link_add_strtab): Likewise.
	* ctf-util.c (ctf_list_empty_p): New.
2019-10-03 17:04:56 +01:00
Nick Alcock d851ecd373 libctf: support getting strings from the ELF strtab
The CTF file format has always supported "external strtabs", which
internally are strtab offsets with their MSB on: such refs
get their strings from the strtab passed in at CTF file open time:
this is usually intended to be the ELF strtab, and that's what this
implementation is meant to support, though in theory the external
strtab could come from anywhere.

This commit adds support for these external strings in the ctf-string.c
strtab tracking layer.  It's quite easy: we just add a field csa_offset
to the atoms table that tracks all strings: this field tracks the offset
of the string in the ELF strtab (with its MSB already on, courtesy of a
new macro CTF_SET_STID), and adds a new function that sets the
csa_offset to the specified offset (plus MSB).  Then we just need to
avoid writing out strings to the internal strtab if they have csa_offset
set, and note that the internal strtab is shorter than it might
otherwise be.

(We could in theory save a little more time here by eschewing sorting
such strings, since we never actually write the strings out anywhere,
but that would mean storing them separately and it's just not worth the
complexity cost until profiling shows it's worth doing.)

We also have to go through a bit of extra effort at variable-sorting
time.  This was previously using direct references to the internal
strtab: it couldn't use ctf_strptr or ctf_strraw because the new strtab
is not yet ready to put in its usual field (in a ctf_file_t that hasn't
even been allocated yet at this stage): but now we're using the external
strtab, this will no longer do because it'll be looking things up in the
wrong strtab, with disastrous results.  Instead, pass the new internal
strtab in to a new ctf_strraw_explicit function which is just like
ctf_strraw except you can specify a ne winternal strtab to use.

But even now that it is using a new internal strtab, this is not quite
enough: it can't look up strings in the external strtab because ld
hasn't written it out yet, and when it does will write it straight to
disk.  Instead, when we write the internal strtab, note all the offset
-> string mappings that we have noted belong in the *external* strtab to
a new "synthetic external strtab" dynhash, ctf_syn_ext_strtab, and look
in there at ctf_strraw time if it is set.  This uses minimal extra
memory (because only strings in the external strtab that we actually use
are stored, and even those come straight out of the atoms table), but
let both variable sorting and name interning when ctf_bufopen is next
called work fine.  (This also means that we don't need to filter out
spurious ECTF_STRTAB warnings from ctf_bufopen but can pass them back to
the caller, once we wrap ctf_bufopen so that we have a new internal
variant of ctf_bufopen etc that we can pass the synthetic external
strtab to. That error has been filtered out since the days of Solaris
libctf, which didn't try to handle the problem of getting external
strtabs right at construction time at all.)

v3: add the synthetic strtab and all associated machinery.
v5: fix tabdamage.

include/
	* ctf.h (CTF_SET_STID): New.

libctf/
	* ctf-impl.h (ctf_str_atom_t) <csa_offset>: New field.
	(ctf_file_t) <ctf_syn_ext_strtab>: Likewise.
	(ctf_str_add_ref): Name the last arg.
	(ctf_str_add_external) New.
	(ctf_str_add_strraw_explicit): Likewise.
	(ctf_simple_open_internal): Likewise.
	(ctf_bufopen_internal): Likewise.

	* ctf-string.c (ctf_strraw_explicit): Split from...
	(ctf_strraw): ... here, with new support for ctf_syn_ext_strtab.
	(ctf_str_add_ref_internal): Return the atom, not the
	string.
	(ctf_str_add): Adjust accordingly.
	(ctf_str_add_ref): Likewise.  Move up in the file.
	(ctf_str_add_external): New: update the csa_offset.
	(ctf_str_count_strtab): Only account for strings with no csa_offset
	in the internal strtab length.
	(ctf_str_write_strtab): If the csa_offset is set, update the
	string's refs without writing the string out, and update the
	ctf_syn_ext_strtab.  Make OOM handling less ugly.
	* ctf-create.c (struct ctf_sort_var_arg_cb): New.
	(ctf_update): Handle failure to populate the strtab.  Pass in the
	new ctf_sort_var arg.  Adjust for ctf_syn_ext_strtab addition.
	Call ctf_simple_open_internal, not ctf_simple_open.
	(ctf_sort_var): Call ctf_strraw_explicit rather than looking up
	strings by hand.
	* ctf-hash.c (ctf_hash_insert_type): Likewise (but using
	ctf_strraw).  Adjust to diagnose ECTF_STRTAB nonetheless.
	* ctf-open.c (init_types): No longer filter out ECTF_STRTAB.
	(ctf_file_close): Destroy the ctf_syn_ext_strtab.
	(ctf_simple_open): Rename to, and reimplement as a wrapper around...
	(ctf_simple_open_internal): ... this new function, which calls
	ctf_bufopen_internal.
	(ctf_bufopen): Rename to, and reimplement as a wrapper around...
	(ctf_bufopen_internal): ... this new function, which sets
	ctf_syn_ext_strtab.
2019-10-03 17:04:55 +01:00
Nick Alcock f5e9c9bde0 libctf: deduplicate and sort the string table
ctf.h states:

> [...] the CTF string table does not contain any duplicated strings.

Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer.  There is no global
view of the strings and no deduplication.

Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.

Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).

Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go.  The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'.  Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).

The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.

No change in externally-visible API or file format (other than the
reduction in pointless redundancy).

libctf/
	* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
	struct ctf_strs.
	(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
	(struct ctf_str_atom): New, disambiguated single string.
	(struct ctf_str_atom_ref): New, points to some other location that
	references this string's offset.
	(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
	Remove member ctf_dtvstrlen: we no longer track the total strlen
	as we add strings.
	(ctf_str_create_atoms): Declare new function in ctf-string.c.
	(ctf_str_free_atoms): Likewise.
	(ctf_str_add): Likewise.
	(ctf_str_add_ref): Likewise.
	(ctf_str_purge_refs): Likewise.
	(ctf_str_write_strtab): Likewise.
	(ctf_realloc): Declare new function in ctf-util.c.

	* ctf-open.c (ctf_bufopen): Create the atoms table.
	(ctf_file_close): Destroy it.
	* ctf-create.c (ctf_update): Copy-and-free it on update.  No longer
	special-case the position of the parname string.  Construct the
	strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
	rest of each buffer element is constructed, not via open-coding:
	realloc the CTF buffer and append the strtab to it.  No longer
	maintain ctf_dtvstrlen.  Sort the variable entry table later, after
	strtab construction.
	(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
	(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
	after buffer element construction instead.
	(ctf_copy_lmembers): Likewise.
	(ctf_copy_emembers): Likewise.
	(ctf_create): No longer maintain the ctf_dtvstrlen.
	(ctf_dtd_delete): Likewise.
	(ctf_dvd_delete): Likewise.
	(ctf_add_generic): Likewise.
	(ctf_add_enumerator): Likewise.
	(ctf_add_member_offset): Likewise.
	(ctf_add_variable): Likewise.
	(membadd): Likewise.
	* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
	if there are active ctf_str_num_refs.
	(ctf_strraw): Move to ctf-string.c.
	(ctf_strptr): Likewise.
	* ctf-string.c: New file, strtab manipulation.

	* Makefile.am (libctf_a_SOURCES): Add it.
	* Makefile.in: Regenerate.
2019-07-01 11:05:59 +01:00