|
Go to the previous, next chapter.
One of the most common things that actions do is to output or print
some or all of the input. For simple output, use the print
statement. For fancier formatting use the printf
statement. Both are described in this chapter.
- Print: The
print
statement.
- Print Examples: Simple
examples of
print statements.
- Output Separators: The
output separators and how to change them.
- OFMT: Controlling Numeric
Output With
print.
- Printf: The
printf
statement.
- Redirection: How to
redirect output to multiple files and pipes.
- Special Files: File name
interpretation in
gawk. gawk
allows access to inherited file descriptors.
The print statement does output with simple,
standardized formatting. You specify only the strings or numbers
to be printed, in a list separated by commas. They are output,
separated by single spaces, followed by a newline. The statement
looks like this:
print item1, item2, ...
The entire list of items may optionally be enclosed in
parentheses. The parentheses are necessary if any of the item
expressions uses a relational operator; otherwise it could be
confused with a redirection
(see section Redirecting Output of print
and printf, !=, , >, >=, ,
~ and !~ (see section Comparison Expressions).
The items printed can be constant strings or numbers, fields
of the current record (such as $1), variables, or
any awk expressions. The print
statement is completely general for computing what
values to print. With two exceptions, you cannot specify how
to print them---how many columns, whether to use exponential
notation or not, and so on. (See section Output Separators, and section Controlling Numeric Output with print
statement (see section Using printf
Statements for Fancier Printing with no items is equivalent
to print $0: it prints the entire current record. To
print a blank line, use print "", where ""
is the null, or empty, string.
To print a fixed piece of text, use a string constant such as "Hello
there" as one item. If you forget to use the
double-quote characters, your text will be taken as an awk
expression, and you will probably get an error. Keep in mind that
a space is printed between any two items.
Most often, each print statement makes one line
of output. But it isn't limited to one line. If an item value is
a string that contains a
newline, the newline is output along with the rest of the string. A single print
can make any number of lines
this way.
Here is an example of printing a string
that contains embedded newlines:
awk 'BEGIN { print "line one\nline two\nline three" }'
produces output like this:
line one
line two
line three
Here is an example that prints the first two fields of each
input record, with a space between them:
awk '{ print $1, $2 }' inventory-shipped
Its output looks like this:
Jan 13
Feb 15
Mar 15
...
A common mistake in using the print statement is
to omit the comma between two items. This often has the effect of
making the items run together in the output, with no space. The
reason for this is that juxtaposing two string expressions in awk
means to concatenate them. For example, without the comma:
awk '{ print $1 $2 }' inventory-shipped
prints:
Jan13
Feb15
Mar15
...
Neither example's output makes much sense to someone
unfamiliar with the file inventory-shipped. A heading
line at the beginning would make it clearer. Let's add some
headings to our table of months ($1) and green
crates shipped ($2). We do this using the BEGIN pattern (see: BEGIN and END
Special Patterns) to force the headings to be printed only
once:
awk 'BEGIN { print "Month Crates"
print "----- ------" }
{ print $1, $2 }' inventory-shipped
Did you already guess what happens? This program prints the
following:
Month Crates
----- ------
Jan 13
Feb 15
Mar 15
...
The headings and the table data don't line up! We can fix this
by printing some spaces between the two fields:
awk 'BEGIN { print "Month Crates"
print "----- ------" }
{ print $1, " ", $2 }' inventory-shipped
You can imagine that this way of lining up columns can get
pretty complicated when you have many columns to fix. Counting
spaces for two or three columns can be simple, but more than this
and you can get ``lost'' quite easily. This is why the printf
statement was created (see:printf Summary = Statements for
Fancier Printing});
As mentioned previously, a print statement
contains a list of items, separated by commas. In the output, the
items are normally separated by single spaces. But they do not
have to be spaces; a single space is only the default. You can
specify any string of
characters to use as the output field separator
by setting the built-in variable OFS. The initial
value of this variable is the string "
", that is, just a single space.
The output from an entire print statement is
called an output record. Each print
statement outputs one output record and then outputs a string called the output
record separator. The built-in variable ORS
specifies this string. The
initial value of the variable is the string "\n"
containing a newline character; thus, normally each print
statement makes a separate line.
You can change how output fields and records are separated by
assigning new values to the variables OFS and/or ORS.
The usual place to do this is in the BEGIN rule (see: BEGIN and END
Special Patterns) ,so files.
The following example prints the first and second fields of
each input record separated by a semicolon, with a blank line
added after each line:
awk 'BEGIN { OFS = ";"; ORS = "\n\n" }
{ print $1, $2 }' BBS-list
If the value of ORS does not contain a newline,
all your output will be run together on a single line, unless you
output newlines some other way.
When you use the print statement to print numeric
values, awk internally converts the number to a string of characters, and prints
that string. awk
uses the sprintf function
to do this conversion. For now, it suffices to say that the sprintf function accepts a format
specification that tells it how to format numbers (or strings), and
that there are a number of
different ways that numbers can be formatted. The different format specifications are
discussed more fully in section Using printf
Statements for Fancier Printing contains the default format specification that print
uses with sprintf when it wants to convert a number to a string for printing. By supplying
different format specifications
as the value of OFMT, you can change how print
will print your numbers. As a brief example:
awk 'BEGIN { OFMT = "%d" # print numbers as integers
print 17.23 }'
will print 17.
If you want more precise control over the output format than print
gives you, use printf. With printf you
can specify the width to use for each item, and you can specify
various stylistic choices for numbers (such as what radix to use,
whether to print an exponent, whether to print a sign, and how
many digits to print after the decimal point). You do this by
specifying a string, called the format string, which controls
how and where to print the other arguments.
The printf statement looks like this:
printf format, item1, item2, ...
The entire list of arguments may optionally be enclosed in
parentheses. The parentheses are necessary if any of the item
expressions uses a relational operator; otherwise it could be
confused with a redirection
(see section Redirecting Output of print
and printf, !=, , >, >=, ,
~ and !~ (see section Comparison Expressions).
The difference between printf and print
is the argument format.
This is an expression whose value is taken as a string; it specifies how to output
each of the other arguments. It is called the format string.
The format string is the same as in the ANSI C library function printf. Most
of format is text to
be output verbatim. Scattered among this text are format specifiers,
one per item. Each format
specifier says to output the next item at that place in the format.
The printf statement does not automatically
append a newline to its output. It outputs only what the format specifies. So if you want a
newline, you must include one in the format. The output separator
variables OFS and ORS have no effect on printf
statements.
A format specifier starts
with the character % and ends with a format-control
letter; it tells the printf statement how to
output one item. (If you actually want to output a %,
write %%.) The format-control
letter specifies what kind of value to print. The rest of the format specifier is made up of
optional modifiers which are parameters such as the field width to use.
Here is a list of the format-control
letters:
- c This prints a number
as an ASCII character. Thus, printf "%c",
65 outputs the letter A. The output
for a string value is
the first character of the string.
d This prints a decimal integer.
i This also prints a decimal integer.
e This prints a number
in scientific (exponential) notation. For example,
printf "%4.3e", 1950
prints 1.950e+03, with a total of four
significant figures of which three follow the decimal
point. The 4.3 are modifiers,
discussed below.
f This prints a number
in floating point notation.
g This prints a number
in either scientific notation or floating point notation,
whichever uses fewer characters.
o This prints an unsigned octal integer.
s This prints a string.
x This prints an unsigned hexadecimal integer.
X This prints an unsigned hexadecimal integer.
However, for the values 10 through 15, it uses the
letters A through F instead of a
through f.
% This isn't really a format-control letter, but
it does have a meaning when used after a %:
the sequence %% outputs one %.
It does not consume an argument.
A format specification can
also include modifiers that can control how much of
the item's value is printed and how much space it gets. The
modifiers come between the % and the format-control letter. Here are
the possible modifiers, in the order in which they may appear:
- - The minus sign, used before the width modifier,
says to left-justify the argument within its specified
width. Normally the argument is printed right-justified
in the specified width. Thus,
-
printf "%-4s", "foo"
prints foo .
width This is a number representing the
desired width of a field.
Inserting any number
between the % sign and the format control character
forces the field to be
expanded to this width. The default way to do this is to
pad with spaces on the left. For example,
printf "%4s", "foo"
prints foo.
The value of width is a minimum width, not
a maximum. If the item value requires more than width
characters, it can be as wide as necessary. Thus,
printf "%4s", "foobar"
prints foobar.
Preceding the width with a minus sign
causes the output to be padded with spaces on the right,
instead of on the left.
.prec This is a number that specifies the
precision to use when printing. This specifies the number of digits you want
printed to the right of the decimal point. For a string, it specifies the
maximum number of
characters from the string
that should be printed.
The C library printf's
dynamic width and prec capability (for
example, "%*.*s") is supported. Instead of
supplying explicit width and/or prec values
in the format string, you pass them in the
argument list. For example:
w = 5
p = 3
s = "abcdefg"
printf "\n", w, p, s
is exactly equivalent to
s = "abcdefg"
printf "\n", s
Both programs output . (We have used the bullet symbol ``*''
to represent a space, to clearly show you that there are two
spaces in the output.)
Earlier versions of awk did not support this
capability. You may simulate it by using concatenation to build up the format string, like so:
w = 5
p = 3
s = "abcdefg"
printf "\n", s
This is not particularly easy to read, however.
Here is how to use printf to make an aligned
table:
awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list
prints the names of bulletin boards ($1) of the
file BBS-list as a string
of 10 characters, left justified. It also prints the phone
numbers ($2) afterward on the line. This produces an
aligned two-column table of names and phone numbers:
aardvark 555-5553
alpo-net 555-3412
barfly 555-7685
bites 555-1675
camelot 555-0542
core 555-2912
fooey 555-1234
foot 555-6699
macfoo 555-6480
sdace 555-3430
sabafoo 555-2127
Did you notice that we did not specify that the phone numbers
be printed as numbers? They had to be printed as strings because
the numbers are separated by a dash. This dash would be
interpreted as a minus sign if we had tried to print the phone
numbers as numbers. This would have led to some pretty confusing
results.
We did not specify a width for the phone numbers because they
are the last things on their lines. We don't need to put spaces
after them.
We could make our table look even nicer by adding headings to
the tops of the columns. To do this, use the BEGIN pattern (see section BEGIN and END
Special Patterns program:
awk 'BEGIN { print "Name Number"
print "---- ------" }
{ printf "%-10s %s\n", $1, $2 }' BBS-list
Did you notice that we mixed print and printf
statements in the above example? We could have used just printf
statements to get the same results:
awk 'BEGIN { printf "%-10s %s\n", "Name", "Number"
printf "%-10s %s\n", "----", "------" }
{ printf "%-10s %s\n", $1, $2 }' BBS-list
By outputting each column heading with the same format specification used for the
elements of the column, we have made sure that the headings are
aligned just like the columns.
The fact that the same format
specification is used three times can be emphasized by storing it
in a variable, like this:
awk 'BEGIN { format = "%-10s %s\n"
printf format, "Name", "Number"
printf format, "----", "------" }
{ printf format, $1, $2 }' BBS-list
See if you can use the printf statement to line
up the headings and table data for our inventory-shipped
example covered earlier in the section on the print
statement (see section The print
Statement and printf
So far we have been dealing only with output that prints to
the standard output, usually your terminal. Both print
and printf can also send their output to other
places. This is called redirection.
A redirection appears after
the print or printf statement.
Redirections in awk are written just like
redirections in shell commands, except that they are written
inside the awk program.
Here are the three forms of output redirection. They are all shown
for the print statement, but they work identically
for printf also.
- print items > output-file
This type of redirection
prints the items onto the output file output-file.
The file name output-file can be any
expression. Its value is changed to a string and then used as a
file name (see section Expressions
as Action Statements).
When this type of redirection
is used, the output-file is erased before the
first output is written to it. Subsequent writes do not
erase output-file, but append to it. If output-file
does not exist, then it is created.
For example, here is how one awk program
can write a list of BBS names to a file name-list
and a list of phone numbers to a file phone-list.
Each output file contains one name or number per line.
awk '{ print $2 > "phone-list"
print $1 > "name-list" }' BBS-list
print items >> output-file
This type of redirection
prints the items onto the output file output-file.
The difference between this and the single-> redirection is that the
old contents (if any) of output-file are not
erased. Instead, the awk output is appended
to the file.
print items | command
It is also possible to send output through a pipe
instead of into a file. This type of redirection opens a pipe
to command and writes the values of items
through this pipe, to another process created to execute command.
The redirection
argument command is actually an awk
expression. Its value is converted to a string, whose contents
give the shell command to be run.
For example, this produces two files, one unsorted
list of BBS names and one list sorted in reverse
alphabetical order:
awk '{ print $1 > "names.unsorted"
print $1 | "sort -r > names.sorted" }' BBS-list
Here the unsorted list is written with an ordinary redirection while the
sorted list is written by piping through the sort
utility.
Here is an example that uses redirection to mail a
message to a mailing list bug-system. This
might be useful when trouble is encountered in an awk
script run periodically for system maintenance.
report = "mail bug-system"
print "Awk script failed:", $0 | report
print "at record number", FNR, "of", FILENAME | report
close(report)
We call the close function here because it's
a good idea to close the pipe as soon as all the intended
output has been sent to it. See section Closing Output Files and Pipes,
for more information on this. This example also
illustrates the use of a variable to represent a file
or command: it is not necessary to always use
a string constant.
Using a variable is generally a good idea, since awk
requires you to spell the string
value identically every time.
Redirecting output using >, >>,
or | asks the system to open a file or pipe only if
the particular file or command you've
specified has not already been written to by your program, or if
it has been closed since it was last written to.
When a file or pipe is opened, the file name or command
associated with it is remembered by awk and
subsequent writes to the same file or command are appended to the
previous writes. The file or pipe stays open until awk
exits. This is usually convenient.
Sometimes there is a reason to close an output file or pipe
earlier than that. To do this, use the close function, as follows:
close(filename)
or
close(command)
The argument filename or command can be
any expression. Its value must exactly equal the string used to open the file or
pipe to begin with---for example, if you open a pipe with this:
print $1 | "sort -r > names.sorted"
then you must close it with this:
close("sort -r > names.sorted")
Here are some reasons why you might need to close an output
file:
- To write a file and read it back later on in the same
awk
program. Close the file when you are finished writing it;
then you can start reading it with getline
(see also: Explicit
Input with getline) To write numerous
files, successively, in the same awk
program. If you don't close the files, eventually you may
exceed a system limit on the number of open files in
one process. So close each one when you are finished
writing it.
- To make a command finish. When you redirect output
through a pipe, the command reading the pipe normally
continues to try to read input as long as the pipe is
open. Often this means the command cannot really do its
work until the pipe is closed. For example, if you
redirect output to the
mail program, the
message is not actually sent until the pipe is closed.
- To run the same program a second time, with the same
arguments. This is not the same thing as giving more
input to the first run!
For example, suppose you pipe
output to the mail program. If you output
several lines redirected to this pipe without closing it,
they make a single message of several lines. By contrast,
if you close the pipe after each line of output, then
each line makes a separate message.
close returns a value of zero if the close
succeeded. Otherwise, the value will be non-zero. In this case, gawk
sets the variable ERRNO to a string describing the error that
occurred.
Running programs conventionally have three input and output
streams already available to them for reading and writing. These
are known as the standard input, standard output,
and standard error output. These streams are, by
default, terminal input and output, but they are often redirected
with the shell, via the , , >, >>, >&
and | operators. Standard error is used only for
writing error messages; the reason we have two separate streams,
standard output and standard error, is so that they can be
redirected separately.
In other implementations of awk, the only way to
write an error message to standard error in an awk
program is as follows:
print "Serious error detected!\n" | "cat 1>&2"
This works by opening a pipeline to a shell command which can
access the standard error stream which it inherits from the awk
process. This is far from elegant, and is also inefficient, since
it requires a separate process. So people writing awk
programs have often neglected to do this. Instead, they have sent
the error messages to the terminal, like this:
NF != 4 {
printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/tty"
}
This has the same effect most of the time, but not always:
although the standard error stream is usually the terminal, it
can be redirected, and when that happens, writing to the terminal
is not correct. In fact, if awk is run from a
background job, it may not have a terminal at all. Then opening /dev/tty
will fail.
gawk provides special file names for accessing
the three standard streams. When you redirect input or output in gawk,
if the file name matches one of these special names, then gawk
directly uses the stream it stands for.
- /dev/stdin The standard input (file descriptor 0).
/dev/stdout The standard output (file
descriptor 1).
/dev/stderr The standard error output (file
descriptor 2).
/dev/fd/N The file associated
with file descriptor N. Such a file must have
been opened by the program initiating the awk
execution (typically the shell). Unless you take special
pains, only descriptors 0, 1 and 2 are available.
The file names /dev/stdin, /dev/stdout, and /dev/stderr
are aliases for /dev/fd/0, /dev/fd/1, and /dev/fd/2,
respectively, but they are more self-explanatory.
The proper way to write an error message in a gawk
program is to use /dev/stderr, like this:
NF != 4 {
printf("line %d skipped: doesn't have 4 fields\n", FNR) > "/dev/stderr"
}
gawk also provides special file names that give
access to information about the running gawk
process. Each of these ``files'' provides a single record of
information. To read them more than once, you must first close
them with the close function
(see section Closing Input Files and
Pipes). The filenames are:
- /dev/pid Reading this file returns the process ID
of the current process, in decimal, terminated with a
newline.
/dev/ppid Reading this file returns the parent
process ID of the current process, in decimal, terminated
with a newline.
/dev/pgrpid Reading this file returns the
process group ID of the current process, in decimal,
terminated with a newline.
/dev/user Reading this file returns a single
record terminated with a newline. The fields are
separated with blanks. The fields represent the following
information:
- $1 The value of the
getuid
system call.
$2 The value of the geteuid
system call.
$3 The value of the getgid
system call.
$4 The value of the getegid
system call.
If there are any additional fields, they are the group
IDs returned by getgroups system call.
(Multiple groups may not be supported on all systems.)
These special file names may be used on the command line as
data files, as well as for I/O redirections within an awk
program. They may not be used as source files with the -f
option.
Recognition of these special file names is disabled if gawk
is in compatibility mode. Caution: Unless your
system actually has a /dev/fd directory (or any of the
other above listed special files), the interpretation of these
file names is done by gawk itself. For example,
using /dev/fd/4 for output will actually write on
file descriptor 4, and not on a new file descriptor that was dup'ed
from file descriptor 4. Most of the time this does not matter;
however, it is important to not close any of the files
related to file descriptors 0, 1, and 2. If you do close one of
these files, unpredictable behavior will result.
To return to the Ready-to-Run Software Win95Pak Table of Contents please press here.
|