The words.sh script in its current form extracts c comments from files, which
it then transforms into a list of words.
To use the script on the documentation (as I did for commit 6b92c0d353
"[gdb/doc] Fix typos"), I needed to disable the "extract c comments" part.
Add an option -c that enables extracting c comments, and is off by default.
gdb/ChangeLog:
2019-11-25 Tom de Vries <tdevries@suse.de>
* contrib/words.sh: Add -c option.
Change-Id: Ifa34d435b3c41b3ff845dc07ae4b0d9f02d92a2d
Remove more punctuation and quoting in words.sh script.
gdb/ChangeLog:
2019-11-22 Tom de Vries <tdevries@suse.de>
* contrib/words.sh: Improve words extraction.
Change-Id: I1d9eea165731af4e6c4e1c7e09aed9b07af6395c
Currently running words.sh on all the c source and header files in the repo
takes ~16s in user time:
...
$ time ./gdb/contrib/words.sh \
$(find -type f -name "*.c" -o -name "*.h") \
>/dev/null
real 0m7,787s
user 0m16,349s
sys 0m0,367s
...
Rewrite the sed invocations using the -e option from this:
...
| sed <sedprog1>
| sed <sedprog2>
...
into this:
...
| sed \
-e <sedprog1>
-e <sedprog2>
...
and reduce user time to ~11s:
...
$ time ./gdb/contrib/words.sh \
$(find -type f -name "*.c" -o -name "*.h") \
>/dev/null
real 0m7,243s
user 0m11,220s
sys 0m0,205s
...
gdb/ChangeLog:
2019-11-22 Tom de Vries <tdevries@suse.de>
* contrib/words.sh: Combine sed invocations.
Change-Id: Ib08453f3712f32ed02d9f503ee960711ebb9421b
Add a script that takes a list of files as arguments and output a list of
words from the C comments with their frequencies.
For:
...
$ ./gdb/contrib/words.sh $(find gdb -type f -name "*.c" -o -name "*.h")
...
it generates a list of ~15000 words prefixed with frequency.
This could be used to generate a dictionary that is kept as part of the
sources, against which new code can be checked, generating a warning or
error. The hope is that misspellings would trigger this frequently, and rare
words rarely, otherwise the burden of updating the dictionary would be too
much.
And for:
...
$ ./gdb/contrib/words.sh -f 1 $(find gdb -type f -name "*.c" -o -name "*.h")
...
it generates a list of ~5000 words with frequency 1.
This can be used to scan for misspellings manually.
Change-Id: I7b119c9a4519cdbf62a3243d1df2927c80813e8b