Go to the previous, next
These commands work with (or produce) sorted files.
sort sorts, merges, or compares all the lines from the given files, or
standard input if none are given or for a file of -. By default,
writes the results to standard output. Synopsis:
sort [option]... [file]...
sort has three modes of operation: sort (the default), merge, and check
for sortedness. The following options change the operation mode:
- -c -c Check whether the given files are already sorted: if they are not
all sorted, print an error message and exit with a status of 1.
Merge the given files by sorting them as a group. Each input file must always be
individually sorted. It always works to sort instead of merge; merging is provided because
it is faster, in the case where it works.
A pair of lines is compared as follows: if any key fields have been specified,
compares each pair of fields, in the order specified on the command line, according to the
associated ordering options, until a difference is found or no fields are left.
If any of the global options Mbdfinr are given but no key fields are
sort compares the entire lines according to the global options.
Finally, as a last resort when all keys compare equal (or if no ordering options were
specified at all),
sort compares the lines byte by byte in machine collating
sequence. The last resort comparison honors the -r global option. The -s
(stable) option disables this last-resort comparison so that lines in which all fields
compare equal are left in their original relative order. If no fields or global options
are specified, -s has no effect.
sort (as specified for all GNU utilities) has no limits on input line
length or restrictions on bytes allowed within lines. In addition, if the final byte of an
input file is not a newline, GNU
sort silently supplies one.
If the environment variable
TMPDIR is set,
sort uses its
value as the directory for temporary files instead of /tmp. The -T tempdir
option in turn overrides the environment variable.
The following options affect the ordering of output lines. They may be specified
globally or as part of a specific key field. If no key fields are specified, global
options apply to comparison of entire lines; otherwise the global options are inherited by
key fields that do not specify any special options of their own.
- -b -b Ignore leading blanks when finding sort keys in each line.
-d Sort in phone directory order: ignore all characters except letters,
digits and blanks when sorting.
-f -f Fold lowercase characters into the equivalent uppercase characters
when sorting so that, for example, b and B sort as equal.
-i -i Ignore characters outside the printable ASCII range 040-0176 octal
(inclusive) when sorting.
-M -M An initial string, consisting of any amount of whitespace, followed
by three letters abbreviating a month name, is folded to UPPER case and compared in the
order JAN FEB DEC. Invalid names compare low to valid names.
-n -n Sort numerically: the number begins each line; specifically, it
consists of optional whitespace, an optional - sign, and zero or more digits,
optionally followed by a decimal point and zero or more digits.
-r -r Reverse the result of comparison, so that lines with greater key
values appear earlier in the output instead of later.
Other options are:
- -o output-file -o Write output to output-file
instead of standard output. If output-file is one of the input files,
copies it to a temporary file before sorting and writing the output to output-file.
-t separator -t Use character separator as the
field separator when finding the sort keys in each line. By default, fields are separated
by the empty string between a non-whitespace character and a whitespace character. That
is, given the input line foo bar,
sort breaks it into fields foo
and bar. The field separator is not considered to be part of either the field
preceding or the field following.
-u -u For the default case or the -m option, only output the
first of a sequence of lines that compare equal. For the -c option, check
that no pair of consecutive lines compares equal.
-k pos1[,pos2] -k The recommended, POSIX, option
for specifying a sort field. The field consists of the line between pos1 and pos2
(or the end of the line, if pos2 is omitted), inclusive. Fields and character
positions are numbered starting with 1. See below.
+pos1[-pos2] The obsolete, traditional option for
specifying a sort field. The field consists of the line between pos1 and up to
but not including pos2 (or the end of the line if pos2 is
omitted). Fields and character positions are numbered starting with 0. See below.
In addition, when GNU
sort is invoked with exactly one argument, options --help
and --version are recognized. See section Common
Historical (BSD and System V) implementations of
sort have differed in
their interpretation of some options, particularly -b, -f, and -n.
GNU sort follows the POSIX behavior, which is usually (but not always!) like the System V
behavior. According to POSIX, -n no longer implies -b. For
consistency, -M has been changed in the same way. This may affect the meaning
of character positions in field specifications in obscure cases. The only fix is to add an
A position in a sort field specified with the -k or + option
has the form f.c, where f is the number of the field to
use and c is the number of the first character from the beginning of the field
(for +pos) or from the end of the previous field (for -pos).
If the .c is omitted, it is taken to be the first character in the
field. If the -b option was specified, the .c part of
a field specification is counted from the first nonblank character of the field (for +pos)
or from the first nonblank character following the previous field (for -pos).
A sort key option may also have any of the option letters Mbdfinr appended
to it, in which case the global ordering options are not used for that particular field.
The -b option may be independently attached to either or both of the +pos
and -pos parts of a field specification, and if it is inherited
from the global options it will be attached to both. If a -n or -M
option is used, thus implying a -b option, the -b option is
taken to apply to both the +pos and the -pos
parts of a key specification. Keys may span multiple fields.
Here are some examples to illustrate various combinations of options. In them, the
POSIX -k option is used to specify sort keys rather than the obsolete +pos1-pos2
- Sort in descending (reverse) numeric order.
Sort alphabetically, omitting the first and second fields. This uses a single key
composed of the characters beginning at the start of field three and extending to the end
of each line.
- Sort numerically on the second field and resolve ties by sorting alphabetically on the
third and fourth characters of field five. Use : as the field delimiter.
sort -t : -k 2,2n -k 5.3,5.4
Note that if you had written -k 2 instead of -k 2,2 sort
would have used all characters beginning in the second field and extending to the end of
the line as the primary numeric key. For the large majority of applications,
treating keys spanning more than one field as numeric will not do what you expect.
Also note that the n modifier was applied to the field-end specifier for
the first key. It would have been equivalent to specify -k 2n,2 or -k
2n,2n. All modifiers except b apply to the associated field,
regardless of whether the modifier character is attached to the field-start and/or the
field-end part of the key specifier.
- Sort the password file on the fifth field and ignore any leading white space. Sort lines
with equal values in field five on the numeric user ID in field three.
sort -t : -k 5b,5 -k 3,3n /etc/passwd
An alternative is to use the global numeric modifier -n.
sort -t : -n -k 5b,5 -k 3,3 /etc/passwd
Finally, to ignore both leading and trailing white space, you could have applied the b
modifier to the field-end specifier for the first key,
sort -t : -n -k 5b,5b -k 3,3 /etc/passwd
or by using the global -b modifier instead of -n and an
explicit n with the second key specifier.
sort -t : -b -k 5,5 -k 3,3n /etc/passwd
uniq writes the unique lines in the given input, or standard
input if nothing is given or for an input name of -. Synopsis:
uniq [option]... [input [output]]
uniq prints the unique lines in a sorted file, i.e., discards
all but one of identical successive lines. Optionally, it can instead show only lines that
appear exactly once, or lines that appear more than once.
The input must be sorted. If your input is not sorted, perhaps you want to use
If no output file is specified,
uniq writes to standard output.
The program accepts the following options. Also see section Common options.
- -n -f n --skip-fields=n -n
-f --skip-fields Skip n fields on each line before checking for
uniqueness. Fields are sequences of non-space non-tab characters that are separated from
each other by at least one spaces or tabs.
+n -s n --skip-chars=n
+n -s --skip-chars Skip n characters before
checking for uniqueness. If you use both the field and character skipping options, fields
are skipped over first.
-c --count -c --count Print the number of times each line
occurred along with the line.
-d --repeated -d --repeated Print only duplicate lines.
-u --unique -u --unique Print only unique lines.
-w n --check-chars=n -w --check-chars
Compare n characters on each line (after skipping any specified fields and
characters). By default the entire rest of the lines are compared.
comm writes to standard output lines that are common, and lines that are
unique, to two input files; a file name of - means standard input. Synopsis:
comm [option]... file1 file2
The input files must be sorted before
comm can be used.
With no options,
comm produces three column output. Column one contains
lines unique to file1, column two contains lines unique to file2,
and column three contains lines common to both files. Columns are separated by TAB.
-1 -2 -3 The options -1, -2, and -3
suppress printing of the corresponding columns. Also see section Common options.