Commit Graph

3 Commits

Author SHA1 Message Date
Tom de Vries 85e7588dc4 [gdb/contrib] Improve words extraction in words.sh script
Remove more punctuation and quoting in words.sh script.

gdb/ChangeLog:

2019-11-22  Tom de Vries  <tdevries@suse.de>

	* contrib/words.sh: Improve words extraction.

Change-Id: I1d9eea165731af4e6c4e1c7e09aed9b07af6395c
2019-11-22 16:23:22 +01:00
Tom de Vries f618007364 [gdb/contrib] Combine sed invocations in words.sh script
Currently running words.sh on all the c source and header files in the repo
takes ~16s in user time:
...
$ time ./gdb/contrib/words.sh \
    $(find -type f -name "*.c" -o -name "*.h") \
    >/dev/null

real    0m7,787s
user    0m16,349s
sys     0m0,367s
...

Rewrite the sed invocations using the -e option from this:
...
   | sed <sedprog1>
   | sed <sedprog2>
...
into this:
...
   | sed \
       -e <sedprog1>
       -e <sedprog2>
...
and reduce user time to ~11s:
...
$ time ./gdb/contrib/words.sh \
    $(find -type f -name "*.c" -o -name "*.h") \
    >/dev/null

real    0m7,243s
user    0m11,220s
sys     0m0,205s
...

gdb/ChangeLog:

2019-11-22  Tom de Vries  <tdevries@suse.de>

	* contrib/words.sh: Combine sed invocations.

Change-Id: Ib08453f3712f32ed02d9f503ee960711ebb9421b
2019-11-22 16:23:22 +01:00
Tom de Vries 496af5c811 [gdb/contrib] Add words.sh script
Add a script that takes a list of files as arguments and output a list of
words from the C comments with their frequencies.

For:
...
$ ./gdb/contrib/words.sh $(find gdb -type f -name "*.c" -o -name "*.h")
...
it generates a list of ~15000 words prefixed with frequency.

This could be used to generate a dictionary that is kept as part of the
sources, against which new code can be checked, generating a warning or
error.  The hope is that misspellings would trigger this frequently, and rare
words rarely, otherwise the burden of updating the dictionary would be too
much.

And for:
...
$ ./gdb/contrib/words.sh -f 1 $(find gdb -type f -name "*.c" -o -name "*.h")
...
it generates a list of ~5000 words with frequency 1.

This can be used to scan for misspellings manually.

Change-Id: I7b119c9a4519cdbf62a3243d1df2927c80813e8b
2019-11-07 10:49:56 +01:00