Node:Wc Program, Previous:Uniq Program, Up:Clones
The wc (word count) utility counts lines, words, and characters in
one or more input files. Its usage is as follows:
wc [-lwc] [ files ... ]
If no files are specified on the command line, wc reads its standard
input. If there are multiple files, it also prints total counts for all
the files. The options and their meanings are shown in the following list:
-l
-w
awk separates
fields in its input data.
-c
Implementing wc in awk is particularly elegant,
since awk does a lot of the work for us; it splits lines into
words (i.e., fields) and counts them, it counts lines (i.e., records),
and it can easily tell us how long a line is.
This uses the getopt library function
(see Processing Command-Line Options)
and the file-transition functions
(see Noting Data File Boundaries).
This version has one notable difference from traditional versions of
wc: it always prints the counts in the order lines, words,
and characters. Traditional versions note the order of the -l,
-w, and -c options on the command line, and print the
counts in that order.
The BEGIN rule does the argument processing. The variable
print_total is true if more than one file is named on the
command line:
# wc.awk --- count lines, words, characters
# Options:
# -l only count lines
# -w only count words
# -c only count characters
#
# Default is to count lines, words, characters
#
# Requires getopt and file transition library functions
BEGIN {
# let getopt print a message about
# invalid options. we ignore them
while ((c = getopt(ARGC, ARGV, "lwc")) != -1) {
if (c == "l")
do_lines = 1
else if (c == "w")
do_words = 1
else if (c == "c")
do_chars = 1
}
for (i = 1; i < Optind; i++)
ARGV[i] = ""
# if no options, do all
if (! do_lines && ! do_words && ! do_chars)
do_lines = do_words = do_chars = 1
print_total = (ARGC - i > 2)
}
The beginfile function is simple; it just resets the counts of lines,
words, and characters to zero, and saves the current file name in
fname:
function beginfile(file)
{
chars = lines = words = 0
fname = FILENAME
}
The endfile function adds the current file's numbers to the running
totals of lines, words, and characters.1 It then prints out those numbers
for the file that was just read. It relies on beginfile to reset the
numbers for the following data file:
function endfile(file)
{
tchars += chars
tlines += lines
twords += words
if (do_lines)
printf "\t%d", lines
if (do_words)
printf "\t%d", words
if (do_chars)
printf "\t%d", chars
printf "\t%s\n", fname
}
There is one rule that is executed for each line. It adds the length of
the record, plus one, to chars. Adding one plus the record length
is needed because the newline character separating records (the value
of RS) is not part of the record itself, and thus not included
in its length. Next, lines is incremented for each line read,
and words is incremented by the value of NF, which is the
number of "words" on this line:
# do per line
{
chars += length($0) + 1 # get newline
lines++
words += NF
}
Finally, the END rule simply prints the totals for all the files:
END {
if (print_total) {
if (do_lines)
printf "\t%d", tlines
if (do_words)
printf "\t%d", twords
if (do_chars)
printf "\t%d", tchars
print "\ttotal"
}
}
wc can't just use the value of
FNR in endfile. If you examine
the code in
Noting Data File Boundaries
you will see that
FNR has already been reset by the time
endfile is called.