|
Go to the previous, next chapter.
Control statements such as if, while,
and so on control the flow of execution in awk
programs. Most of the control statements in awk are
patterned on similar statements in C.
All the control statements start with special keywords such as if
and while, to distinguish them from simple
expressions.
Many control statements contain other statements; for example,
the if statement contains another statement which
may or may not be executed. The contained statement is called the body.
If you want to include more than one statement in the body, group
them into a single compound statement with curly braces, separating them with
newlines or semicolons.
The if-else statement is awk's
decision-making statement. It looks like this:
if (condition) then-body [else else-body]
condition is an expression that controls what the
rest of the statement will do. If condition is true, then-body
is executed; otherwise, else-body is executed
(assuming that the else clause is present). The else
part of the statement is optional. The condition is considered
false if its value is zero or the null string, and true otherwise.
Here is an example:
if (x % 2 == 0)
print "x is even"
else
print "x is odd"
In this example, if the expression x % 2 == 0 is
true (that is, the value of x is divisible by 2),
then the first print statement is executed,
otherwise the second print statement is performed.
If the else appears on the same line as then-body,
and then-body is not a compound statement (i.e., not
surrounded by curly braces),
then a semicolon must separate then-body from else.
To illustrate this, let's rewrite the previous example:
awk '{ if (x % 2 == 0) print "x is even"; else
print "x is odd" }'
If you forget the ;, awk won't be
able to parse the statement, and you will get a syntax error.
We would not actually write this example this way, because a
human reader might fail to see the else if it were
not the first thing on its line.
In programming, a loop means a part of a program
that is (or at least can be) executed two or more times in
succession.
The while statement is the simplest looping
statement in awk. It repeatedly executes a statement
as long as a condition is true. It looks like this:
while (condition)
body
Here body is a statement that we call the body
of the loop, and condition is an expression that
controls how long the loop keeps running.
The first thing the while statement does is test condition.
If condition is true, it executes the statement body.
(condition is true when the value is not zero and not
a null string.) After body
has been executed, condition is tested again, and if
it is still true, body is executed again. This process
repeats until condition is no longer true. If condition
is initially false, the body of the loop is never executed.
This example prints the first three fields of each record, one
per line.
awk '{ i = 1
while (i
Here the body of the loop is a compound statement enclosed in braces, containing two statements.
The loop works like this: first, the value of i
is set to 1. Then, the while tests whether i
is less than or equal to three. This is the case when i
equals one, so the i-th field is printed. Then the i++
increments the value of i and the loop repeats. The
loop terminates when i reaches 4.
As you can see, a newline is not required between the
condition and the body; but using one makes the program clearer
unless the body is a compound statement or is very simple. The
newline after the open-brace that begins the compound statement
is not required either, but the program would be hard to read
without it.
The do loop is a variation of the while
looping statement. The do loop executes the body
once, then repeats body as long as condition
is true. It looks like this:
do
body
while (condition)
Even if condition is false at the start, body
is executed at least once (and only once, unless executing body
makes condition true). Contrast this with the
corresponding while statement:
while (condition)
body
This statement does not execute body even once if condition
is false to begin with.
Here is an example of a do statement:
awk '{ i = 1
do {
print $0
i++
} while (i
prints each input record ten times. It isn't a very realistic
example, since in this case an ordinary while would
do just as well. But this reflects actual experience; there is
only occasionally a real use for a do statement.
The for statement makes it more convenient to
count iterations of a loop. The general form of the for
statement looks like this:
for (initialization; condition; increment)
body
This statement starts by executing initialization.
Then, as long as condition is true, it repeatedly
executes body and then increment. Typically initialization
sets a variable to either zero or one, increment adds
1 to it, and condition compares it against the desired number of iterations.
Here is an example of a for statement:
awk '{ for (i = 1; i
This prints the first three fields of each input record, one field per line.
In the for statement, body stands for
any statement, but initialization, condition
and increment are just expressions. You cannot set
more than one variable in the initialization part
unless you use a multiple assignment
statement such as x = y = 0, which is possible only
if all the initial values are equal. (But you can initialize
additional variables by writing their assignments as separate
statements preceding the for loop.)
The same is true of the increment part; to
increment additional variables, you must write separate
statements at the end of the loop. The C compound expression, using C's comma operator, would be
useful in this context, but it is not supported in awk.
Most often, increment is an increment expression,
as in the example above. But this is not required; it can be any
expression whatever. For example, this statement prints all the
powers of 2 between 1 and 100:
for (i = 1; i
Any of the three expressions in the parentheses following the for
may be omitted if there is nothing to be done there. Thus, for
(;x > 0;)
is equivalent to while (x > 0). If the
condition is omitted, it is treated as true, effectively
yielding an infinite loop (i.e., a loop that will never
terminate).
In most cases, a for loop is an abbreviation for
a while loop, as shown here:
initialization
while (condition) {
body
increment
}
The only exception is when the continue statement
(see section The continue
Statement statement to a while statement in this
way can change the effect of the continue statement
inside the loop.
There is an alternate version of the for loop,
for iterating over all the indices of an array:
for (i in array)
do something with array[i]
See section Arrays in awk
loop.
The awk language has a for statement
in addition to a while statement because often a for
loop is both less work to type and more natural to think of.
Counting the number of
iterations is very common in loops. It can be easier to think of
this counting as part of looping rather than as something to do
inside the loop.
The next section has more complicated examples of for
loops.
The break statement jumps out of the innermost for, while,
or do-while loop that encloses it. The
following example finds the smallest divisor of any integer, and
also identifies prime numbers:
awk '# find smallest divisor of num
{ num = $1
for (div = 2; div*div
When the remainder is zero in the first if
statement, awk immediately breaks out of
the containing for loop. This means that awk
proceeds immediately to the statement following the loop and
continues processing. (This is very different from the exit
statement which stops the entire awk program. (see
also: The exit
Statement) the condition of a for or while
could just as well be replaced with a break inside
an if:
awk '# find smallest divisor of num
{ num = $1
for (div = 2; ; div++) {
if (num % div == 0) {
printf "Smallest divisor of %d is %d\n", num, div
break
}
if (div*div > num) {
printf "%d is prime\n", num
break
}
}
}'
The continue statement, like break,
is used only inside for, while, and do-while
loops. It skips over the rest of the loop body, causing the next
cycle around the loop to begin immediately. Contrast this with break,
which jumps out of the loop altogether. Here is an example:
# print names that don't contain the string "ignore"
# first, save the text of each line { names[NR] = $0 }
# print what we're interested in END { for (x in names) { if
(names[x] ~ /ignore/) continue print names[x] } }
If one of the input records contains the string ignore, this
example skips the print statement for that record, and continues
back to the first statement in the loop.
This is not a practical example of continue,
since it would be just as easy to write the loop like this:
for (x in names)
if (names[x] !~ /ignore/)
print names[x]
The continue statement in a for loop
directs awk to skip the rest of the body of the
loop, and resume execution with the increment-expression of the for
statement. The following program illustrates this fact:
awk 'BEGIN {
for (x = 0; x
This program prints all the numbers from 0 to 20, except for
5, for which the printf is skipped. Since the
increment x++ is not skipped, x does
not remain stuck at 5. Contrast the for loop above
with the while loop:
awk 'BEGIN {
x = 0
while (x
This program loops forever once x gets to 5.
As described above, the continue statement has no
meaning when used outside the body of a loop. However, although
it was never documented, historical implementations of awk
have treated the continue statement outside of a
loop as if it were a next statement (see section The next Statement
silently supports this usage. However, if -W posix
has been specified on the command line (see section Invoking awk standard
specifies that continue should only be used inside
the body of a loop.
The next statement forces awk to
immediately stop processing the current record and go on to the
next record. This means that no further rules are executed for
the current record. The rest of the current rule's action is not executed either.
Contrast this with the effect of the getline function (see section Explicit Input with getline
to read the next record immediately, but it does not alter the
flow of control in any way. So the rest of the current action executes with a new input
record.
At the highest level, awk program execution is a
loop that reads an input record and then tests each rule's pattern against it. If you think
of this loop as a for statement whose body contains
the rules, then the next statement is analogous to a continue
statement: it skips to the end of the body of this implicit loop,
and executes the increment (which reads another record).
For example, if your awk program works only on
records with four fields, and you don't want it to fail when
given bad input, you might use this rule
near the beginning of the program:
NF != 4 {
printf("line %d skipped: doesn't have 4 fields", FNR) > "/dev/stderr"
next
}
so that the following rules will not see the bad record. The
error message is redirected to the standard error output stream,
as error messages should be. See section Standard I/O Streams.
According to the POSIX standard, the behavior is undefined if
the next statement is used in a BEGIN
or END rule. gawk
will treat it as a syntax error.
If the next statement causes the end of the input
to be reached, then the code in the END rules, if
any, will be executed. See section BEGIN
and END Special Patterns Statement
The next file statement is similar to the next
statement. However, instead of abandoning processing of the
current record, the next file statement instructs awk
to stop processing the current data file.
Upon execution of the next file statement, FILENAME
is updated to the name of the next data file listed on the
command line, FNR is reset to 1, and processing
starts over with the first rule
in the progam. See section Built-in
Variables.
If the next file statement causes the end of the
input to be reached, then the code in the END rules,
if any, will be executed. See section BEGIN
and END Special Patterns statement is a gawk
extension; it is not (currently) available in any other awk
implementation. You can simulate its behavior by creating a
library file named nextfile.awk, with the following
contents. (This sample program uses user-defined functions, a
feature that has not been presented yet. See section User-defined Functions, for more
information.)
# nextfile --- function to skip remaining records in current file
# this should be read in before the "main" awk
program
function nextfile() {
_abandon_ = FILENAME; next }
_abandon_ == FILENAME && FNR > 1 { next } _abandon_
== FILENAME && FNR == 1 { _abandon_ = "" }
record. Since this file is read before the main awk
program, the rules that follows the function
definition will be executed before the rules in the main program.
The first rule continues to
skip records as long as the name of the input file has not
changed, and this is not the first record in the file. This rule is sufficient most of the
time. But what if the same data file is named twice in a
row on the command line? This rule
would not process the data file the second time. The second rule catches this case: If the
data file name is what was being skipped, but FNR is
1, then this is the second time the file is being processed, and
it should not be skipped.
The next file statement would be useful if you
have many data files to process, and due to the nature of the
data, you expect that you would not want to process every record
in the file. In order to move on to the next data file, you would
have to continue scanning the unwanted records (as described
above). The next file statement accomplishes this
much more efficiently.
The exit statement causes awk to
immediately stop executing the current rule and to stop processing input;
any remaining input is ignored.
If an exit statement is executed from a BEGIN rule the program stops processing
everything immediately. No input records are read. However, if an END rule is present, it is executed
(see section BEGIN and END
Special Patterns is used as part of an END rule, it causes the program to
stop immediately.
An exit statement that is part of an ordinary rule (that is, not part of a BEGIN
or END rule) stops
the execution of any further automatic rules, but the END rule is executed if there is one.
If you do not want the END rule to do its job in this case,
you can set a variable to nonzero before the exit
statement, and check that variable in the END rule.
If an argument is supplied to exit, its value is
used as the exit status code for the awk process. If
no argument is supplied, exit returns status zero
(success).
For example, let's say you've discovered an error condition
you really don't know how to handle. Conventionally, programs
report this by exiting with a nonzero status. Your awk
program can do this using an exit statement with a
nonzero argument. Here's an example of this:
BEGIN {
if (("date" | getline date_now) "/dev/stderr"
exit 4
}
}
To return to the Ready-to-Run Software Win95Pak Table of Contents please press here.
|