Go to the previous, next chapter.
There are several ways to think about the differences between two files. One way to
think of the differences is as a series of lines that were deleted from, inserted in, or
changed in one file to produce the other file.
diff compares two files line
by line, finds groups of lines that differ, and reports each group of differing lines. It
can report the differing lines in several formats, which have different purposes.
diff can show whether files are different without detailing the
differences. It also provides ways to suppress certain kinds of differences that are not
important to you. Most commonly, such differences are changes in the amount of white space
between words or lines.
diff also provides ways to suppress differences in
alphabetic case or in lines that match a regular expression that you provide. These
options can accumulate; for example, you can ignore changes in both white space and
Another way to think of the differences between two files is as a sequence of pairs of
characters that can be either identical or different.
cmp reports the
differences between two files character by character, instead of line by line. As a
result, it is more useful than
diff for comparing binary files. For text
cmp is useful mainly when you want to know only whether two files are
To illustrate the effect that considering changes character by character can have
compared with considering them line by line, think of what happens if a single newline
character is added to the beginning of a file. If that file is then compared with an
otherwise identical file that lacks the newline at the beginning,
report that a blank line has been added to the file, while
cmp will report
that almost every character of the two files differs.
diff3 normally compares three input files line by line, finds groups of
lines that differ, and reports each group of differing lines. Its output is designed to
make it easy to inspect two different sets of changes to the same file.
- Hunks: Groups of differing lines.
- White Space: Suppressing differences in white space.
- Blank Lines: Suppressing differences in blank lines.
- Case Folding: Suppressing differences in alphabetic case.
- Specified Folding: Suppressing differences that match
- Brief: Summarizing which files are different.
- Binary: Comparing binary files or forcing text
When comparing two files,
diff finds sequences of lines common to both
files, interspersed with groups of differing lines called hunks. Comparing two
identical files yields one sequence of common lines and no hunks, because no lines differ.
Comparing two entirely different files yields no common lines and one large hunk that
contains all lines of both files. In general, there are many ways to match up lines
between two given files.
diff tries to minimize the total hunk size by
finding large sequences of common lines interspersed with small hunks of differing lines.
For example, suppose the file F contains the three lines a, b,
c, and the file G contains the same three lines in reverse order c,
b, a. If
diff finds the line c as
common, then the command diff F G produces this output:
diff notices the common line b instead, it produces
It is also possible to find a as the common line.
does not always find an optimal matching between the files; it takes shortcuts to run
faster. But its output is usually close to the shortest possible. You can adjust this
tradeoff with the --minimal option (see section
diff Performance Tradeoffs).
The -b and --ignore-space-change options ignore white space
at line end, and considers all other sequences of one or more white space characters to be
equivalent. With these options,
diff considers the following two lines to be
equivalent, where $ denotes the line end:
Here lyeth muche rychnesse in lytell space. -- John Heywood$
Here lyeth muche rychnesse in lytell space. -- John Heywood $
The -w and --ignore-all-space options are stronger than -b.
They ignore difference even if one file has white space where the other file has none. White
space characters include tab, newline, vertical tab, form feed, carriage return, and
space; some locales may define additional characters to be white space. With these
diff considers the following two lines to be equivalent, where $
denotes the line end and ^M denotes a carriage return:
Here lyeth muche rychnesse in lytell space.-- John Heywood$
He relyeth much erychnes seinly tells pace. --John Heywood ^M$
The -B and --ignore-blank-lines options ignore insertions or
deletions of blank lines. These options normally affect only lines that are completely
empty; they do not affect lines that look empty but contain space or tab characters. With
these options, for example, a file containing
1. A point is that which has no part.
2. A line is breadthless length. -- Euclid, The Elements, I is considered identical to
a file containing
1. A point is that which has no part.
2. A line is breadthless length.
-- Euclid, The Elements, I
diff can treat lowercase letters as equivalent to their uppercase
counterparts, so that, for example, it considers Funky Stuff, funky
STUFF, and fUNKy stuFf to all be the same. To request this, use the -i
or --ignore-case option.
To ignore insertions and deletions of lines that match a regular expression, use the -I
regexp or --ignore-matching-lines=regexp option.
You should escape regular expressions that contain shell metacharacters to prevent the
shell from expanding them. For example, diff -I '^[0-9]' ignores all changes
to lines beginning with a digit.
However, -I only ignores the insertion or deletion of lines that contain
the regular expression if every changed line in the hunk---every insertion and every
deletion---matches the regular expression. In other words, for each nonignorable change,
prints the complete set of changes in its vicinity, including the ignorable ones.
You can specify more than one regular expression for lines to ignore by using more than
one -I option.
diff tries to match each line against each
regular expression, starting with the last one given.
When you only want to find out whether files are different, and you don't care what the
differences are, you can use the summary output format. In this format, instead of showing
the differences between the files,
diff simply reports whether files differ.
The -q and --brief options select this output format.
This format is especially useful when comparing the contents of two directories. It is
also much faster than doing the normal line by line comparisons, because
can stop analyzing the files as soon as it knows that there are any differences.
You can also get a brief indication of whether two files differ by using
For files that are identical,
cmp produces no output. When the files differ,
cmp outputs the byte offset and line number where the first
difference occurs. You can use the -s option to suppress that information, so
cmp produces no output and reports whether the files differ using only
its exit status (see section Invoking
cmp cannot compare directories; it can only
compare two files.
diff thinks that either of the two files it is comparing is binary (a
non-text file), it normally treats that pair of files much as if the summary output format
had been selected (see section Summarizing Which Files Differ),
and reports only that the binary files are different. This is because line by line
comparisons are usually not meaningful for binary files.
diff determines whether a file is text or binary by checking the first few
bytes in the file; the exact number of bytes is system dependent, but it is typically
several thousand. If every character in that part of the file is non-null,
considers the file to be text; otherwise it considers the file to be binary.
Sometimes you might want to force
diff to consider files to be text. For
example, you might be comparing text files that contain null characters;
would erroneously decide that those are non-text files. Or you might be comparing
documents that are in a format used by a word processing system that uses null characters
to indicate special formatting. You can force
diff to consider all files to
be text files, and compare them line by line, by using the -a or --text
option. If the files you compare using this option do not in fact contain text, they will
probably contain few newline characters, and the
diff output will consist of
hunks showing differences between long lines of whatever characters the files contain.
You can also force
diff to consider all files to be binary files, and
report only whether they differ (but not how). Use the --brief option for
In operating systems that distinguish between text and binary files,
normally reads and writes all data as text. Use the --binary option to force
to read and write binary data instead. This option has no effect on a Posix-compliant
system like GNU or traditional Unix. However, many personal computer operating systems
represent the end of a line with a carriage return followed by a newline. On such systems,
diff normally ignores these carriage returns on input and generates them at
the end of each output line, but with the --binary option
treats each carriage return as just another input character, and does not generate a
carriage return at the end of each output line. This can be useful when dealing with
non-text files that are meant to be interchanged with Posix-compliant systems.
If you want to compare two files byte by byte, you can use the
with the -l option to show the values of each differing byte in the two
files. With GNU
cmp, you can also use the -c option to show the
ASCII representation of those bytes. See section Invoking
for more information.
diff3 thinks that any of the files it is comparing is binary (a
non-text file), it normally reports an error, because such comparisons are usually not
diff3 uses the same test as
diff to decide whether a
file is binary. As with
diff, if the input files contain a few non-text
characters but otherwise are like text files, you can force
diff3 to consider
all files to be text files and compare them line by line by using the -a or --text