|
Go to the previous, next chapter.
In the typical awk program, all input is read
either from the standard input (by default the keyboard, but
often a pipe from another command) or from files whose names you
specify on the awk command line. If you specify
input files, awk reads them in order, reading all
the data from one before going on to the next. The name of the
current input file can be found in the built-in variable FILENAME
(see section Built-in Variables).
The input is read in units called records, and processed by
the rules one record at a time. By default, each record is one
line. Each record is split automatically into fields, to make it
more convenient for a rule to
work on its parts.
On rare occasions you will need to use the getline
command, which can do explicit input from any number of files (*note Explicit
input with `getline')
* Records:: Controlling how data is split into records.
* Fields:: An introduction to fields.
* Non-Constant Fields:: Non-constant Field Numbers.
* Changing Fields:: Changing the Contents of a Field.
* Field Separators:: The field separator and how to change it.
* Constant Size:: Reading constant width data.
* Multiple Line:: Reading multi-line records.
* Getline:: Reading files under explicit program control
using the getline function.
* Close Input:: Closing an input file (so you can read from
the beginning once more).
The awk language divides its input into records
and fields. Records are separated by a character called the record
separator. By default, the record separator is the newline
character, defining a record to be a single line of text.
Sometimes you may want to use a different character to
separate your records. You can use a different character by
changing the built-in variable RS. The value of RS
is a string that says how to
separate records; the default value is "\n",
the string containing just a
newline character. This is why records are, by default, single
lines.
RS can have any string
as its value, but only the first character of the string is used as the record
separator. The other characters are ignored. RS is
exceptional in this regard; awk uses the full value
of all its other built-in variables.
You can change the value of RS in the awk
program with the assignment
operator, = (see section Assignment
Expressions). The new record-separator character should be
enclosed in quotation marks to make a string constant. Often the right
time to do this is at the beginning of execution, before any
input has been processed, so that the very first record will be
read with the proper separator. To do this, use the special BEGIN pattern(see: BEGIN and END
Special Patterns ) For example:
awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list
changes the value of RS to "/",
before reading any input. This is a string
whose first character is a slash; as a result, records are
separated by slashes. Then the input file is read, and the second rule in the awk
program (the action with no pattern) prints each record. Since
each print statement adds a newline at the end of
its output, the effect of this awk program is to
copy the input with each slash changed to a newline.
Another way to change the record separator is on the command
line, using the variable-assignment
feature (see: Command
Line Options).
awk '{ print $0 }' RS="/" BBS-list
This sets RS to / before processing BBS-list.
Reaching the end of an input file terminates the current input
record, even if the last character in the file is not the
character in RS.
The empty string, ""
(a string of no characters),
has a special meaning as the value of RS: it means
that records are separated only by blank lines. See section Multiple-Line Records, for more
details.
The awk utility keeps track of the number of records that have been
read so far from the current input file. This value is stored in
a built-in variable called FNR. It is reset to zero
when a new file is started. Another built-in variable, NR,
is the total number of input
records read so far from all files. It starts at zero but is
never automatically reset to zero.
If you change the value of RS in the middle of an awk
run, the new value is used to delimit subsequent records, but the
record currently being processed (and records already processed)
are not affected.
When awk reads an input record, the record is
automatically separated or parsed by the interpreter
into chunks called fields. By default, fields are
separated by whitespace, like
words in a line. Whitespace in awk
means any string of one or more
spaces and/or tabs; other characters such as newline, formfeed,
and so on, that are considered whitespace
by other languages are not considered whitespace by awk.
The purpose of fields is to make it more convenient for you to
refer to these pieces of the record. You don't have to use
them---you can operate on the whole record if you wish---but
fields are what make simple awk programs so
powerful.
To refer to a field in an awk
program, you use a dollar-sign, $, followed by the number of the field you want. Thus, $1
refers to the first field, $2
to the second, and so on. For example, suppose the following is a
line of input:
This seems like a pretty nice example.
Here the first field, or $1,
is This; the second field,
or $2, is seems; and so on. Note that
the last field, $7,
is example.. Because there is no space between the e
and the ., the period is considered part of the
seventh field.
No matter how many fields there are, the last field in a record can be
represented by $NF. So, in the example above, $NF
would be the same as $7, which is example..
Why this works is explained below (see section Non-constant Field Numbers). If you
try to refer to a field beyond
the last one, such as $8 when the record has only 7
fields, you get the empty string.
Plain NF, with no $, is a built-in
variable whose value is the number
of fields in the current record.
$0, which looks like an attempt to refer to the
zeroth field, is a special
case: it represents the whole input record. This is what you
would use if you weren't interested in fields.
Here are some more examples:
awk '$1 ~ /foo/ { print $0 }' BBS-list
This example prints each record in the file BBS-list
whose first field contains the string foo. The
operator ~ is called a matching operator
(see section Comparison Expressions);
it tests whether a string
(here, the field $1)
matches a given regular expression.
By contrast, the following example:
awk '/foo/ { print $1, $NF }' BBS-list
looks for foo in the entire record and
prints the first field and the
last field for each input
record containing a match.
The number of a field does not need to be a
constant. Any expression in the awk language can be
used after a $ to refer to a field. The value of the expression
specifies the field number. If the value is a string, rather than a number, it is converted to a number. Consider this example:
awk '{ print $NR }'
Recall that NR is the number of records read so far: 1
in the first record, 2 in the second, etc. So this example prints
the first field of the first
record, the second field of the
second record, and so on. For the twentieth record, field number 20 is printed; most likely,
the record has fewer than 20 fields, so this prints a blank line.
Here is another example of using expressions as field numbers:
awk '{ print $(2*2) }' BBS-list
The awk language must evaluate the expression (2*2)
and use its value as the number
of the field to print. The *
sign represents multiplication, so the expression 2*2
evaluates to 4. The parentheses are used so that the
multiplication is done before the $ operation; they
are necessary whenever there is a binary operator in the field-number expression. This example,
then, prints the hours of operation (the fourth field) for every line of the file BBS-list.
If the field number you compute is zero, you
get the entire record. Thus, $(2-2) has the same
value as $0. Negative field
numbers are not allowed.
The number of fields in the
current record is stored in the built-in variable NF
(see section Built-in Variables).
The expression $NF is not a special feature: it is
the direct consequence of evaluating NF and using
its value as a field number.
You can change the contents of a field
as seen by awk within an awk program;
this changes what awk perceives as the current input
record. (The actual input is untouched: awk never
modifies the input file.)
Consider this example:
awk '{ $3 = $2 - 10; print $2, $3 }' inventory-shipped
The - sign represents subtraction, so this
program reassigns field three, $3,
to be the value of field two
minus ten, $2 - 10. (See section Arithmetic Operators.) Then field two, and the new value for field three, are printed.
In order for this to work, the text in field $2 must make
sense as a number; the string of characters must be
converted to a number in order
for the computer to do arithmetic on it. The number resulting from the
subtraction is converted back to a string
of characters which then becomes field
three. See section Conversion of
Strings and Numbers.
When you change the value of a field
(as perceived by awk), the text of the input record
is recalculated to contain the new field
where the old one was. Therefore, $0 changes to
reflect the altered field.
Thus,
awk '{ $2 = $2 - 10; print $0 }' inventory-shipped
prints a copy of the input file, with 10 subtracted from the
second field of each line.
You can also assign contents to fields that are out of range.
For example:
awk '{ $6 = ($5 + $4 + $3 + $2) ; print $6 }' inventory-shipped
We've just created $6, whose value is the sum of
fields $2, $3, $4, and $5.
The + sign represents addition. For the file inventory-shipped, $6
represents the total number of
parcels shipped for a particular month.
Creating a new field changes
the internal awk copy of the current input
record---the value of $0. Thus, if you do print
$0 after adding a field,
the record printed includes the new field,
with the appropriate number of field separators between it and
the previously existing fields.
This recomputation affects and is affected by several features
not yet discussed, in particular, the output field separator, OFS,
which is used to separate the fields (see section Output Separators), and NF
(the number of fields; see
section Examining Fields). For
example, the value of NF is set to the number of the highest field you create.
Note, however, that merely referencing an
out-of-range field does not
change the value of either $0 or NF.
Referencing an out-of-range field
merely produces a null string.
For example:
if ($(NF+1) != "")
print "can't happen"
else
print "everything is normal"
should print everything is normal, because NF+1
is certain to be out of range. (See section The if Statement's if-else
statements.)
It is important to note that assigning to a field will change the value of $0,
but will not change the value of NF, even when you
assign the null string to a field. For example:
echo a b c d | awk '{ OFS = ":"; $2 = "" ; print ; print NF }'
prints
a::c:d
4
The field is still there, it
just has an empty value. You can tell because there are two
colons in a row.
(This section is rather long; it describes one of the most
fundamental operations in awk. If you are a novice
with awk, we recommend that you re-read this section
after you have studied the section on regular expressions,
section Regular Expressions as
Patterns.)
The way awk splits an input record into fields is
controlled by the field
separator, which is a single character or a regular
expression. awk scans the input record for matches
for the separator; the fields themselves are the text between the
matches. For example, if the field
separator is oo, then the following line:
moo goo gai pan
would be split into three fields: m, @ g
and @ gai@ pan.
The field separator is
represented by the built-in variable FS. Shell
programmers take note! awk does not use the name IFS
which is used by the shell.
You can change the value of FS in the awk
program with the assignment
operator, = (see section Assignment
Expressions). Often the right time to do this is at the
beginning of execution, before any input has been processed, so
that the very first record will be read with the proper
separator. To do this, use the special BEGIN pattern (see section BEGIN and END
Special Patterns to the string ",":
awk 'BEGIN { FS = "," } ; { print $2 }'
Given the input line,
John Q. Smith, 29 Oak St., Walamazoo, MI 42139
this awk program extracts the string @ 29 Oak St..
Sometimes your input data will contain separator characters
that don't separate fields the way you thought they would. For
instance, the person's name in the example we've been using might
have a title or suffix attached, such as John Q. Smith,
LXIX. From input containing such a name:
John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139
the previous sample program would extract @ LXIX,
instead of @ 29 Oak St.. If you were expecting the
program to print the address, you would be surprised. So choose
your data layout and separator characters carefully to prevent
such problems.
As you know, by default, fields are separated by whitespace sequences (spaces and
tabs), not by single spaces: two spaces in a row do not delimit
an empty field. The default
value of the field separator is
a string " "
containing a single space. If this value were interpreted in the
usual way, each space character would separate fields, so two
spaces in a row would make an empty field
between them. The reason this does not happen is that a single
space as the value of FS is a special case: it is
taken to specify the default manner of delimiting fields.
If FS is any other single character, such as ",",
then each occurrence of that character separates two fields. Two
consecutive occurrences delimit an empty field. If the character occurs at
the beginning or the end of the line, that too delimits an empty field. The space character is the
only single character which does not follow these rules.
More generally, the value of FS may be a string containing any regular
expression. Then each match in the record for the regular
expression separates fields. For example, the assignment:
FS = ", \t"
makes every area of an input line that consists of a comma
followed by a space and a tab, into a field separator. (\t
stands for a tab.)
For a less trivial example of a regular expression, suppose
you want single spaces to separate fields the way single commas
were used above. You can set FS to "[@
]". This regular expression matches a single space
and nothing else.
FS can be set on the command line. You use the -F
argument to do so. For example:
awk -F, 'program' input-files
sets FS to be the , character.
Notice that the argument uses a capital F. Contrast
this with -f, which specifies a file containing an awk
program. Case is significant in command options: the -F
and -f options have nothing to do with each other.
You can use both options at the same time to set the FS
argument and get an awk program from a
file.
The value used for the argument to -F is
processed in exactly the same way as assignments to the built-in
variable FS. This means that if the field separator contains special
characters, they must be escaped appropriately. For example, to
use a \ as the field
separator, you would have to type:
# same as FS = "\\"
awk -F\\\\ '...' files ...
Since \ is used for quoting in the shell, awk
will see -F\\. Then awk processes the \\
for escape characters (see section Constant
Expressions), finally yielding a single \ to be
used for the field separator.
As a special case, in compatibility mode (see section Invoking awk is t,
then FS is set to the tab character. (This is
because if you type -F\t, without the quotes, at the
shell, the \ gets deleted, so awk
figures that you really want your fields to be separated with
tabs, and not ts. Use -v FS="t"
on the command line if you really do want to separate your fields
with ts.)
For example, let's use an awk program file called baud.awk
that contains the pattern /300/,
and the action print $1.
Here is the program:
/300/ { print $1 }
Let's also set FS to be the -
character, and run the program on the file BBS-list. The
following command prints a list of the names of the bulletin
boards that operate at 300 baud and the first three digits of
their phone numbers:
awk -F- -f baud.awk BBS-list
It produces this output:
aardvark 555
alpo
barfly 555
bites 555
camelot 555
core 555
fooey 555
foot 555
macfoo 555
sdace 555
sabafoo 555
Note the second line of output. If you check the original
file, you will see that the second line looked like this:
alpo-net 555-3412 2400/1200/300 A
The - as part of the system's name was used as
the field separator, instead of
the - in the phone number
that was originally intended. This demonstrates why you have to
be careful in choosing your field
and record separators.
The following program searches the system password file, and
prints the entries for users who have no password:
awk -F: '$2 == ""' /etc/passwd
Here we use the -F option on the command line to
set the field separator. Note
that fields in /etc/passwd are separated by colons. The
second field represents a
user's encrypted password, but if the field is empty, that user has no
password.
According to the POSIX standard, awk is supposed
to behave as if each record is split into fields at the time that
it is read. In particular, this means that you can change the
value of FS after a record is read, but before any
of the fields are referenced. The value of the fields (i.e. how
they were split) should reflect the old value of FS,
not the new one.
However, many implementations of awk do not do
this. Instead, they defer splitting the fields until a field reference actually happens,
using the current value of FS! This
behavior can be difficult to diagnose. The following example
illustrates the results of the two methods. (The sed
command prints just the first line of /etc/passwd.)
sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }'
will usually print
root
on an incorrect implementation of awk, while gawk
will print something like
root:nSijPlPhZZwgE:0:0:Root:/:
There is an important difference between the two cases of FS
= " " (a single blank) and FS = "[
\t]+" (which is a regular expression matching one or
more blanks or tabs). For both values of FS, fields
are separated by runs of blanks and/or tabs. However, when the
value of FS is " ", awk
will strip leading and trailing whitespace
from the record, and then decide where the fields are.
For example, the following expression prints b:
echo ' a b c d ' | awk '{ print $2 }'
However, the following prints a:
echo ' a b c d ' | awk 'BEGIN { FS = "[ \t]+" } ; { print $2 }'
In this case, the first field
is null.
The stripping of leading and trailing whitespace also comes into play
whenever $0 is recomputed. For instance, this
pipeline
echo ' a b c d' | awk '{ print; $2 = $2; print }'
produces this output:
a b c d
a b c d
The first print statement prints the record as it
was read, with leading whitespace
intact. The assignment to $2
rebuilds $0 by concatenating $1 through $NF
together, separated by the value of OFS. Since the
leading whitespace was ignored
when finding $1, it is not part of the new $0.
Finally, the last print statement prints the new $0.
The following table summarizes how fields are split, based on
the value of FS.
- FS == " " Fields are separated by runs
of whitespace. Leading
and trailing whitespace
are ignored. This is the default.
FS == any single character
Fields are separated by each occurrence of the character.
Multiple successive occurrences delimit empty fields, as
do leading and trailing occurrences.
FS == regexp
Fields are separated by occurrences of characters that
match regexp.
Leading and trailing matches of regexp delimit
empty fields.
(This section discusses an advanced, experimental feature. If
you are a novice awk user, you may wish to skip it
on the first reading.)
gawk 2.13 introduced a new facility for dealing
with fixed-width fields with no distinctive field separator. Data of this
nature arises typically in one of at least two ways: the input
for old FORTRAN programs where numbers are run together, and the
output of programs that did not anticipate the use of their
output as input for other programs.
An example of the latter is a table where all the columns are
lined up by the use of a variable number
of spaces and empty fields are just spaces. Clearly, awk's
normal field splitting based on FS
will not work well in this case. (Although a portable awk
program can use a series of substr calls on $0,
this is awkward and inefficient for a large number of fields.)
The splitting of an input record into fixed-width fields is
specified by assigning a string
containing space-separated numbers to the built-in variable FIELDWIDTHS.
Each number specifies the width
of the field including
columns between fields. If you want to ignore the columns between
fields, you can specify the width as a separate field that is subsequently
ignored.
The following data is the output of the w
utility. It is useful to illustrate the use of FIELDWIDTHS.
10:06pm up 21 days, 14:04, 23 users
User tty login@ idle JCPU PCPU what
hzuo ttyV0 8:58pm 9 5 vi p24.tex
hzang ttyV3 6:37pm 50 -csh
eklye ttyV5 9:53pm 7 1 em thes.tex
dportein ttyV6 8:17pm 1:47 -csh
gierd ttyD3 10:00pm 1 elm
dave ttyD4 9:47pm 4 4 w
brent ttyp0 26Jun91 4:46 26:46 4:41 bash
dave ttyq4 26Jun9115days 46 46 wnewmail
The following program takes the above input, converts the idle
time to number of seconds and
prints out the first two fields and the calculated idle time.
(This program uses a number of awk
features that haven't been introduced yet.)
BEGIN { FIELDWIDTHS = "9 6 10 6 7 7 35" }
NR > 2 {
idle = $4
sub(/^ */, "", idle) # strip leading spaces
if (idle == "") idle = 0
if (idle ~ /:/) { split(idle, t, ":"); idle = t[1] * 60 + t[2] }
if (idle ~ /days/) { idle *= 24 * 60 * 60 }
print $1, $2, idle
}
Here is the result of running the program on the data:
hzuo ttyV0 0
hzang ttyV3 50
eklye ttyV5 0
dportein ttyV6 107
gierd ttyD3 1
dave ttyD4 0
brent ttyp0 286
dave ttyq4 1296000
Another (possibly more practical) example of fixed-width input
data would be the input from a deck of balloting cards. In some
parts of the United States, voters make their choices by punching
holes in computer cards. These cards are then processed to count
the votes for any particular candidate or on any particular
issue. Since a voter may choose not to vote on some issue, any
column on the card may be empty. An awk program for
processing such data could use the FIELDWIDTHS
feature to simplify reading the data.
This feature is still experimental, and will likely evolve
over time.
In some data bases, a single line cannot conveniently hold all
the information in one entry. In such cases, you can use
multi-line records.
The first step in doing this is to choose your data format: when records are not
defined as single lines, how do you want to define them? What
should separate records?
One technique is to use an unusual character or string to separate records. For
example, you could use the formfeed character (written \f
in awk, as in C)
to separate them, making each record a page of the file. To do
this, just set the variable RS to "\f"
(a string containing the
formfeed character). Any other character could equally well be
used, as long as it won't be part of the data in a record.
Another technique is to have blank lines separate records. By
a special dispensation, a null string
as the value of RS indicates that records are
separated by one or more blank lines. If you set RS
to the null string, a record
always ends at the first blank line encountered. And the next
record doesn't start until the first nonblank line that
follows---no matter how many blank lines appear in a row, they
are considered one record-separator. (End of file is also
considered a record separator.)
The second step is to separate the fields in the record. One
way to do this is to put each field
on a separate line: to do this, just set the variable FS
to the string "\n".
(This simple regular expression matches a single newline.)
Another way to separate fields is to divide each of the lines
into fields in the normal manner. This happens by default as a
result of a special feature: when RS is set to the
null string, the newline
character always acts as a field
separator. This is in addition to whatever field separations result from FS.
The original motivation for this special exception was
probably so that you get useful behavior in the default case
(i.e., FS == " "). This feature can be a
problem if you really don't want the newline character to
separate fields, since there is no way to prevent it. However,
you can work around this by using the split function to break up the record
manually (see section Built-in
Functions for String Manipulation).
So far we have been getting our input files from awk's
main input stream---either the standard input (usually your
terminal) or the files specified on the command line. The awk
language has a special built-in command called getline
that can be used to read input under your explicit control.
This command is quite complex and should not be used
by beginners. It is covered here because this is the chapter on
input. The examples that follow the explanation of the getline
command include material that has not been covered yet.
Therefore, come back and study the getline command after
you have reviewed the rest of this manual and have a good
knowledge of how awk works.
getline returns 1 if it finds a record, and 0 if
the end of the file is encountered. If there is some error in
getting a record, such as a file that cannot be opened, then getline
returns -1. In this case, gawk sets the variable ERRNO
to a string describing the
error that occurred.
In the following examples, command stands for a string value that represents a
shell command.
- getline The
getline command can be
used without arguments to read input from the current
input file. All it does in this case is read the next
input record and split it up into fields. This is useful
if you've finished processing the current record, but you
want to do some special processing right now on
the next record. Here's an example:
-
awk '{
if (t = index($0, "/*")) {
if (t > 1)
tmp = substr($0, 1, t - 1)
else
tmp = ""
u = index(substr($0, t + 2), "*/")
while (u == 0) {
getline
t = -1
u = index($0, "*/")
}
if (u
This awk program deletes all C-style comments, /*
... */, from the input. By replacing the print
$0 with other statements, you could perform more
complicated processing on the decommented input, like
searching for matches of a regular expression. (This
program has a subtle problem---can you spot it?)
This form of the getline command sets NF
(the number of fields;
see section Examining Fields), NR
(the number of records
read so far; see section How
Input is Split into Records), FNR (the number of records read
from this input file), and the value of $0.
Note: the new value of $0
is used in testing the patterns of any subsequent rules.
The original value of $0 that triggered the rule which executed getline
is lost. By contrast, the next statement
reads a new record but immediately begins processing it
normally, starting with the first rule in the program. See
section The next
Statement This form of getline reads a
record into the variable var. This is useful
when you want your program to read the next record from
the current input file, but you don't want to subject the
record to the normal input processing.
For example, suppose the next line is a comment, or a
special string, and you
want to read it, but you must make certain that it won't
trigger any rules. This version of getline
allows you to read that line and store it in a variable
so that the main read-a-line-and-check-each-rule loop of awk
never sees it.
The following example swaps every two lines of input.
For example, given:
wan
tew
free
phore
it outputs:
tew
wan
phore
free
Here's the program:
awk '{
if ((getline tmp) > 0) {
print tmp
print $0
} else
print $0
}'
The getline function
used in this way sets only the variables NR
and FNR (and of course, var). The
record is not split into fields, so the values of the
fields (including $0) and the value of NF
do not change.
getline file This form of the getline function takes its input
from the file file. Here file is a string-valued expression
that specifies the file name. file is called
a redirection
since it directs input to come from a different place.
This form is useful if you want to read your input
from a particular file, instead of from the main input
stream. For example, the following program reads its
input record from the file foo.input when it
encounters a first field
with a value equal to 10 in the current input file.
awk '{
if ($1 == 10) {
getline
Since the main input stream is not used, the values of NR
and FNR are not changed. But the record read
is split into fields in the normal manner, so the values
of $0 and other fields are changed. So is
the value of NF.
This does not cause the record to be tested against
all the patterns in the awk program, in the
way that would happen if the record were read normally by
the main processing loop of awk. However the
new record is tested against any subsequent rules, just
as when getline is used without a redirection.
getline var file This
form of the getline function takes its input
from the file file and puts it in the variable var.
As above, file is a string-valued expression
that specifies the file from which to read.
In this version of getline, none of the
built-in variables are changed, and the record is not
split into fields. The only variable changed is var.
For example, the following program copies all the
input files to the output, except for records that say @include filename.
Such a record is replaced by the contents of the file filename.
awk '{
if (NF == 2 && $1 == "@include") {
while ((getline line 0)
print line
close($2)
} else
print
}'
Note here how the name of the extra input file is not
built into the program; it is taken from the data, from
the second field on the @include
line.
The close function
is called to ensure that if two identical @include
lines appear in the input, the entire specified file is
included twice. See section Closing
Input Files and Pipes.
One deficiency of this program is that it does not
process nested @include statements the way a
true macro preprocessor would.
command | getline You can pipe
the output of a command into getline. A pipe
is simply a way to link the output of one program to the
input of another. In this case, the string command
is run as a shell command and its output is piped into awk
to be used as input. This form of getline
reads one record from the pipe.
For example, the following program copies input to
output, except for lines that begin with @execute,
which are replaced by the output produced by running the
rest of the line as a shell command:
awk '{
if ($1 == "@execute") {
tmp = substr($0, 10)
while ((tmp | getline) > 0)
print
close(tmp)
} else
print
}'
The close function
is called to ensure that if two identical @execute
lines appear in the input, the command is run for each
one. See section Closing Input
Files and Pipes.
Given the input:
foo
bar
baz
@execute who
bletch
the program might produce:
foo
bar
baz
hack ttyv0 Jul 13 14:22
hack ttyp0 Jul 13 14:23 (gnu:0)
hack ttyp1 Jul 13 14:23 (gnu:0)
hack ttyp2 Jul 13 14:23 (gnu:0)
hack ttyp3 Jul 13 14:23 (gnu:0)
bletch
Notice that this program ran the command who
and printed the result. (If you try this program
yourself, you will get different results, showing you who
is logged in on your system.)
This variation of getline splits the
record into fields, sets the value of NF and
recomputes the value of $0. The values of NR
and FNR are not changed.
command | getline var
The output of the command command is sent
through a pipe to getline and into the
variable var. For example, the following
program reads the current date and time into the variable current_time,
using the date utility, and then prints it.
awk 'BEGIN {
"date" | getline current_time
close("date")
print "Report printed on " current_time
}'
In this version of getline, none of the
built-in variables are changed, and the record is not
split into fields.
If the same file name or the same shell command is used with getline
more than once during the execution of an awk
program, the file is opened (or the command is executed) only the
first time. At that time, the first record of input is read from
that file or command. The next time the same file or command is
used in getline, another record is read from it, and
so on.
This implies that if you want to start reading the same file
again from the beginning, or if you want to rerun a shell command
(rather than reading more output from the command), you must take
special steps. What you must do is use the close function, as follows:
close(filename)
or
close(command)
The argument filename or command can be
any expression. Its value must exactly equal the string that was used to open the
file or start the command---for example, if you open a pipe with
this:
"sort -r names" | getline foo
then you must close it with this:
close("sort -r names")
Once this function call is
executed, the next getline from that file or command
will reopen the file or rerun the command.
close returns a value of zero if the close
succeeded. Otherwise, the value will be non-zero. In this case, gawk
sets the variable ERRNO to a string describing the error that
occurred.
To return to the Ready-to-Run Software Win95Pak Table of Contents please press here.
|