20+ Debugging Techniques (2019)

Debugging Techniques

Debugging Techniques (2019)

When you set out to fix a problem, it’s important to select the most appropriate strategy for debugging. This blog present 20+ debugging techniques for the effective use of debugger editor and command-line tools.  Here you’ll find additional things you should consider in order to keep your productivity high.


Explore Debug Data with Your Editor

Debuggers may get all the credit, but your code editor (or IDE) can often be an equally nifty tool for locating the source of a bug. Use a real editor, such as Emacs or vim, or a powerful IDE.


Whatever you do, trade up from your system’s basic built-in editor, such as Notepad (Windows), TextEdit (OS X), or Nano and Pico (various Unix distributions). These editors offer only rudimentary facilities.


Your editor’s search command can help you navigate to the code that may be associated with the problem you’re facing. In contrast to your IDE’s function to find all uses of a given identifier, your editor’s search function casts a wider net because it can be more flexible, and also because it includes in the search space text appearing in comments.


One way to make your search more flexible is to search with the stem of the word. Say you’re looking for code associated with an ordering problem. Don’t search for “ordering.” Rather, search for “order,” which will get you all occurrences of order, orders, and order.


You can also specify a regular expression to encompass all possible strings that interest you. If there’s a problem involving a coordinate field specified as x1, x2, y1, or y2, then you can locate references to any one of these fields by searching for [xy][12].


In other cases, your editor can help you pinpoint code that fails to behave in an expected way. Consider the following JavaScript code, which will not display the failure message it should.

var failureMessage = "Failure!", failure occurrences = 5; // More code here

if (failure occurrences > 0)



After a long, stressful day, you may fail to spot the small but obvious error. However, searching for “failureOccurrences” in the code will locate only one of the two variables (the other is spelled “failure occurrences”).


Searching for an identifier is particularly effective for locating typos when the name of the identifier you’re looking for comes from another source: copy-and-pasted from its definitive definition or a displayed error message or carefully typed in.


A neat trick is to use the editor’s command to search for occurrences of the same word. With the Vim editor, you can search forward for identifiers that are the same as the one under the cursor by pressing * (or # for searching backward). In the Emacs editor, the corresponding incantation is Ctrl-s, Ctrl-w.


Your editor is useful when you perform differential debugging. If you have two (in theory) identical complex statements that behave differently, you can quickly spot any differences by copy-and-pasting the one below the other. You can then compare them letter by letter, rather than having your eyes and mind wander from one part of the screen to another.


For larger blocks, you may want to compare, consider splitting your editor window into two vertical halves and putting one block beside the other: this makes it easy to spot any important differences.


Ideally, you’d want a tool such as a diff to identify differences, but this can be tricky if two files you want to compare differences in nonessential elements, such as IP addresses, timestamps, or arguments passed to routines.


Again, your editor can help you here by allowing you to replace the different nonessential text with identical placeholders. As an example, the following vim regular expression substitution command will replace all instances of a Chrome version identifier (e.g., Chrome/45.0.2454.101) appearing in a log file with a string identifying only the major version (e.g., Chrome/45).

:%s/\(Chrome\/[^.]*\)[^ ]*/\1


Finally, the editor can be of great help when you’re trying to pinpoint an error using a long log file chock-full of data. First, your editor makes the removal of nonessential lines child’s play.


For example, if you want to delete all lines containing the string poll from a log file, in vi you’d enter: g/poll/d, whereas in Emacs you’d invoke (M-x) delete-matching-lines.


You can issue such commands multiple times (issuing undo when you overdo it) until the only things left in your log file are the records that really interest you. If the log file’s contents are still too complex to keep in your head, consider commenting the file in the places you understand. For example, you might add “start of the transaction,” “transaction failed,” “retry.”


If you’re examining a large file with a logical block structure you can also use your editor’s outlining facilities to quickly fold and unfold diverse parts and navigate between them. At this point, you can also split your editor’s window into multiple parts so that you can concurrently view related parts.


Things to Remember

  1. Locate misspelled identifiers using your editor’s search commands.
  2. Edit text files to make differences stand out.
  3. Edit log files to increase their readability.


Optimize Your Work Environment

Debugging is a demanding activity. If your development environment is not well tuned to your needs, you can easily die the death of a thousand cuts.


First, come the hardware and software at your disposal. Ensure that you have adequate CPU power, main memory, and secondary storage space at your disposal (locally or on a cloud infrastructure).


Some static analysis tools require a powerful CPU and a lot of memory; for other tasks, you may need to store on disk multiple copies of the project or gigabytes of logs or telemetry data. In other cases, you may benefit from being able to easily launch additional host instances on the cloud.


You shouldn’t have to fight for these resources: your time is (or should be) a lot more valuable than their cost. The same goes for software. Here the restrictions can be associated both with false economies and with excessive restrictions regarding what software you’re allowed to download, install, and use.


Again, if some software will help you debug a problem, it’s inexcusable to have this withheld from you. Debugging is hard enough as it is without additional restrictions on facilities and tools.


Having assembled the resources, spend some effort to make the best out of them. A good personal setup that includes key bindings, aliases, helper scripts, shortcuts, and tool configurations can significantly enhance your debugging productivity. Here are some things you can set up and examples of corresponding Bash commands.


Ensure that your PATH environment variable is composed of all directories that contain the programs you run. When debugging, you may often use system administration commands, so include those in your path.

export PATH="/sbin:/usr/sbin:$PATH"


Configure your shell and editor to automatically complete elements they can deduce. The following example for Git can save you many keystrokes as you juggle between various branches.

# Obtain a copy of the Git completion script
if ! [ -f ~/.bash_completion.d/git-completion.bash] ; then mkdir -p ~/.bash_completion.d
curl https://raw.githubusercontent.com/git/git/master/\ ( contrib/completion/git-completion.bash \
# Enable completion of Git commands
source ~/.bash_completion.d/git-completion.bash


Set your shell prompt and terminal bar to show your identity, the current directory, and host. When debugging, you often use diverse hosts and identities, so a clear identification of your status can help you keep your sanity. 


Configure command-line editing key bindings to match those of your favorite editor. This will boost your productivity when building data analysis pipelines in an incremental fashion.

set -o emacs
# Or
set -o vi
Create aliases or shortcuts for frequently used commands and common typos.
alias h='history 15' alias j=jobs
alias mroe=more


Set environment variables so that various utilities, such as the version control system, will use the paging program and editor of your choice.

export PAGER=less export VISUAL=vim export EDITOR=ex

Log all your commands into a history file so that you can search for valuable debugging incantations months later. Note that you can avoid logging a command invocation (e.g., one that contains a password) by prefixing it with space


Increase the history file size

export HISTFILESIZE=1000000000 export HISTSIZE=1000000 export HISTTIMEFORMAT="%F %T "
Ignore duplicate lines and lines that start with space export HISTCONTROL=ignoreboth
Save multi-line commands as a single line with semicolons shopt -s cmdhist
Append to the history file
shopt -s histappend


Allow the shell’s pathname expansion (globbing—e.g., *) to include files located in subdirectories.
shopt -s globstar

This simplifies applying commands on deep directory hierarchies through the use of the ** wildcard, which expands to all specified files in a directory tree. For example, the following command will


General-Purpose Tools and Techniques

count the number of files whose author is James Gosling, by looking at the JavaDoc tag of Java source code files.

grep '@author.*James Gosling' **/*.java | wc -l 33


Then comes the configuration of individual programs. Invest time to learn and configure the debugger, the editor, the IDE, the version control system, and the humble pager you’re using to match your preferences and working style. IDEs and sophisticated editors support many helpful plugins.


Select the ones you find useful, and set up a simple way to install them on each host on which you set up shop. You will recoup the investment in configuring your tools multiple times over the years.


When debugging, your work will often straddle multiple hosts. There are three important time savers in this context. First, ensure that you can log in to each remote host you use (or execute a command there) without entering your password.


On Unix systems, you can easily do this by setting up a public-private key pair (you typically run ssh-keygen for this) and storing the public key on the remote host in the file named .ssh/authorized_hosts.


Second, set up host aliases so that you can access a host by using a short descriptive name, rather than its full name, possibly prefixed by a different username. You store these aliases in a file named .ssh/config.


Third, find out how you can invoke a GUI application on a remote host and have it display on your desktop. Although this operation can be tricky to set up, it is nowadays possible with most operating systems. Being able to run a GUI debugger or an IDE on a remote host can give you a big productivity boost.


Debugging tasks often span the command line and the GUI world. Therefore, knowing how to connect the two in your environment can be an important time saver. One common thing you’ll find useful is the ability to launch a GUI program from the command line.


The command to use is starting under Windows, open under OS X, gnome-open under Gnome, and KDE-open under KDE.


You will also benefit from being able to copy text (e.g., a long path of a memory dump file) between the command line and the GUI clipboard. Under Windows, you can use the win clip command of the Outwit suite, or, if you have Cygwin installed, you can read from or write to the /dev/clipboard file.


Under Gnome and KDE, you can use the xsel command. If you work on multiple GUI environments, you may want to create a command alias that works in the same way across all environments.


Also, configure your GUI so that you can launch your favorite editor through a file’s context menu and open a shell window with a given current directory through a directory’s context menu. And, if you don’t know that you can drag and drop file names from the GUI’s file browser into a shell window, try it out; it works beautifully.


Having made the investment to create all your nifty configuration files, spend some time to ensure they’re consistently available on all hosts where you’re debugging software.


A nice way to do this is to put the files under version control. This allows you to push improvements or compatibility fixes from any host into a central repository and later pull them back to other hosts.


Setting up shop on a new host simply involves checking out the repository’s files in your new home directory. If you’re using Git to manage your configuration files, specify the files from your home directory that you want to manage in a .gitignore file, such as the following.


Ignore everything

But not these files...
!.bashrc !.editrc !.gdbinit


Note that the advice in this is mostly based on things I’ve found useful over the years. Your needs and development environment may vary considerably from mine. Regularly monitor your development environment to pinpoint and alleviate sources of friction.


If you find yourself repeatedly typing a long sequence of commands or performing many mouse clicks for an operation that could be automated, invest the time to package what you’re doing into a script.


If you find tools getting in your way rather than helping you, determine how to configure them to match your requirements or look for better tools. Finally, look around and ask for other people’s tricks and tools. Someone else may have already found an elegant solution to a problem that is frustrating you.


Things to Remember

  • Boost your productivity through the appropriate configuration of the tools you’re using.
  • Share your environment’s configuration among hosts with a version control system.


Bugs with the Revision Control System

Many bugs you’ll encounter are associated with software changes. New features and fixes, inevitably, introduce new bugs. A revision control system, such as Git, Mercurial, Subversion, or CVS, allows you to dig into the history in order to retrieve valuable intelligence regarding the problem you’re facing.


To benefit from this you must be diligently managing your software’s revisions with a version control system. By “diligently” I mean that you should be recording each change in a separate self-contained commit, documented with a meaningful commit message, and (where applicable) linked to the corresponding issue.


Here are the most useful ways in which a version control system can help your debugging work. The examples use Git’s command-line operations because these work in all environments.


If you prefer to use a GUI tool to perform these tasks, by all means, do so. If you’re using another revision control system, consult its documentation on how you can perform these operations, or consider switching to Git to benefit from all its power. Note that not all version control systems are created equal.


In particular, many have painful and inefficient support for local branching and merging—features that are essential when you debug by experimenting with alternative implementations.


When a new bug appears in your software, begin by reviewing what changes were made to it.

git log


If you know that the problem is associated with a specific file, specify it so that you will only see changes associated with that file.

git log path/to/myfile.js


If you suspect that the problem is associated with particular code lines, you can obtain a listing of the code with each line annotated with details regarding its last change.

git blame path/to/myfile.js

(Specify the -C and -M options to track lines moved within a file and between files.)


If the code associated with the problem is no longer there, you can search for it in the past by looking for a deleted string.

git rev-list --all | xargs git grep extinctMethodName

If you know that the problem appeared after a specific version (say V1.2.3), you can review the changes that occurred after that version.

git log V1.2.3..


If you don’t know the version number but you know the date on which the problem appeared, you can obtain the SHA hash of the last commit before that date.

git rev-list -n 1 --before=2015-08-01 master

You can then use the SHA hash in place of the version string.


If you know that the problem appeared when a specific issue (say, issue 1234) was fixed, you can search for commits associated with that issue.

git log --all --grep='Issue #1234'

(This assumes that a commit addressing issue 1234 will include the string ”Issue #1234” in its message.)


In all the preceding cases, once you have the SHA hash of the commit you want to review (say, 1cb6e3f6), you can inspect the changes associated with it.

git show 1cb6e3f6


You may also want to see the code changes between the two releases.

git diff V1.2.3..V1.3.2


Often, a simple review of the changes can lead you to the problem’s cause. Alternately, having obtained from the commit descriptions the names of the developers associated with a suspect change, you can have a talk with them to see what they were thinking when they wrote that code.


You can also use the version control system as a time-travel machine. For example, you may want to check out an old correct version (say V1.1.0) to run that code under the debugger and compare it with the current one.


git checkout V1.1.0

Even more impressive, if you know that a bug was introduced between, say, V1.1.0 and V1.2.3 and you have a script, say, http://test.sh that will exit with a non-zero code if a test fails, you can ask Git to perform a binary search among all changes until it locates the one that introduced the bug.

git bisect start V1.1.0 V1.2.3

git bisect run http://test.sh

git reset


Git also allows you to experiment with fixes by creating a local branch that you can then integrate or remove.

git checkout -b issue-work-1234


If the experiment was successful integrate the branch git checkout master

git merge issue-work-1234

If the experiment failed to delete the branch

git checkout master
git checkout -D issue-work-1234


Finally, given that you may be asked to urgently debug an issue while you’re working on something else, you may want to temporarily hide your changes while you work on the customer’s version.

git stash save interrupted-to-work-on-V1234

Work on the debugging issue git stash pop


Things to Remember

  • Examining a file’s history with a version control system can show you when and how bugs were introduced.
  • Use a version control system to look at the differences between correct and failing software versions.


Use Monitoring Tools on Systems Composed of Independent Processes

Modern software-based systems rarely consist of a single stand-alone program, which you need to debug when it fails. Instead, they comprise diverse services, components, and libraries.


The quick and efficient identification of the failed element should be your first win when debugging such a system. You can easily accomplish this on the server side by using or by setting up and running an infrastructure monitoring system.


In the following paragraphs, I’ll use as an example of the popular Nagios tool. This is available both as free software and through supported products and services. If your organization already uses another system, work on that one; the principles are the same. Whatever you do, avoid the temptation to concoct a system on your own.


Over a quick home-brewed solution or a passive recording system such as collected or RRD-tool, Nagios offers many advantages: tested passive and active service checks and notifiers, a dashboard, a round-robin event database, unobtrusive monitoring schedules, scalability, and a large user community that contributes plugins.


If your setup is running on the cloud or if it is based on a commonly used application stack, you may also be able to use a cloud-based monitoring system offered as a service. For example, Amazon Web Services (AWS) offers to monitor for the services it provides.


To be able to zero inefficiently on problems, you must monitor the whole stack of your application. Start from the lowest-level resources by monitoring the health of individual hosts:

CPU load, memory use, network reachability, number of executing processes and logged-in users, available software updates, free disk space, open file descriptors, consumed network and disk bandwidth, system logs, security, and remote access.


Moving up one level, verify the correct and reliable functioning of the services your software requires to run: databases, email servers, application servers, caches, network connections, backups, queues, messaging, software licenses, web servers, and directories. Finally, monitor in detail the health of your application. The details here will vary. It’s best to monitor


The end-to-end availability of your application (e.g., if completing a web form will end with a fulfilled transaction)

Individual parts of the application, such as web services, database tables, static web pages, interactive web forms, and reporting Key metrics, such as response latency, queued and fulfilled orders, number of active users, failed transactions, raised errors, reported crashes, and so on


When something fails, Nagios will update the corresponding service status on its web interface. In addition, you want to be notified of the failure immediately, for example, with an SMS or an email. 


For services that fail sporadically, the immediate notification may allow you to debug the service while it is in a failed state, making it easier to pinpoint the cause. You can also arrange for Nagios to open a ticket so that the issue can be assigned, followed, and documented.


Nagios also allows you to see a histogram for the events associated with a service over time. Poring over the time where the failures occur can help you identify other factors that lead to failures, such as excessive CPU load or memory pressure.


If you monitor a service’s complete stack, some low-level failures will cause a cascade of other problems. In such cases, you typically want to start your investigation at the lowest-level failed element.


If the available notification options do not suit your needs, you can easily write a custom notification handler.


Setting up Nagios is easy. The software is available as a package for most popular operating systems and includes built-in support for monitoring key host resources and popular network services. In addition, more than a thousand plugins allow the monitoring of all possible services, from the cloud, clustering, and CMS to security and web forms.


Again, if no plugin matches your requirements, you can easily script your own checker. Simply have the script print the service’s status and exit with 0 if the service you’re checking is OK and with 2 if there’s a critical error.


As an example, the following shell script verifies that a given storage volume has been backed up as a timestamped AWS snapshot.


Things to Remember

  1. Set up a monitoring infrastructure to check all parts composing the service you’re offering.
  2. Quick notification of failures may allow you to debug your system in its failed state.
  3. Use the failure history to identify patterns that may help you pinpoint a problem’s cause.


Simplify the Suspect Code

Complex code is difficult to debug. Many possible execution paths and intricate data flows can confuse your thinking and add to the work you must do to pinpoint the flaw. Therefore, it’s often helpful to simplify the failing code. You can do this temporarily, in order to make the flaw stand out, or permanently in order to fix it.


Before embarking on drastic simplifications, ensure you have a safe way to revert them. All the files you’ll modify should be under version control, and you should have a way to return the code to its initial state, preferably by working on a private branch version.


Temporary modifications typically entail drastically pruning the code. Your goal here is to remove as much code as possible while keeping the failure. This will minimize the amount of suspect code and make it easier to identify the fault. In a typical cycle, you remove a large code block or a call to a complex function, you compile, and then you test the result.


If the result still fails, you continue your pruning; if not, you reduce the pruning you performed. Note that if a failure disappears during a pruning step, you have strong reasons to believe that the code you pruned is somehow associated with the failure. This leads to an alternate approach, in which you try to make the failure go away by removing as little code as possible.


Although you can use your version control system to checkpoint the pruning steps, it’s often quicker to just use your editor. Keep the code in an open editor window, and save the changes after each modification.


If after pruning the failure persists, continue the process. If the failure disappears, undo the previous step, reduce the code you pruned, and repeat. You can systematically perform this task through a binary search process.


Resist the temptation to comment-out code blocks: the nesting of embedded comments in those blocks will be a source of problems. Instead, in languages that support a preprocessor, you can use preprocessor conditionals.

#ifdef ndef

code you don't want to be executed



In other languages, you can temporarily put the statements you want to disable in the block of an if (false) conditional statement. Sometimes instead of removing code, it’s easier to adjust it to simplify its execution.


For example, add a false value at the beginning of an if or loop conditional to ensure that the corresponding code won’t get executed. Thus, you will rewrite (in steps) the following code.

while (a() && b())
if (b() && !c() && d() && !e())
in this simplified form.
while (false && a() && b())
if (false && !c() && d() && !e())


In other cases, you can benefit from permanently simplifying complex statements in order to ease their debugging. Consider the following statement.

p = s.client(q, r).booking(x).period(y, checkout(z)).duration();


Such a statement is justifiably called a train wreck because it resembles train carriages after a crash. It is difficult to debug because you cannot easily see the return of each method.


You can fix this by adding delegate methods or by breaking the expression into separate parts and assigning each result to a temporary variable.

Client c = s.client(q, r);

Booking b = c.booking(x);

CheckoutTime ct = checkout(z);

Period p = b.period(y, ct)

TimeDuration d = p.duration();


This will make it easier for you to observe the result of each call with the debugger or even to add a corresponding logging statement. Given descriptive type or variable names, the rewrite will also make the code more readable.


Note that the change is unlikely to affect the performance of the code: modern compilers are very good at eliminating unneeded temporary variables.


Another worthwhile type of simplification involves breaking one large function into many smaller parts. Here the benefit of debugging comes mainly from the ability you gain to pinpoint the fault by testing each part individually.


The process may also improve your understanding of the code and untangle unwanted interactions between the parts. These two positive side effects can also lead to a solution to the problem.


Changes in hardware technologies may make a particular optimization algorithm irrelevant. For example, operating system kernels used to contain a sophisticated elevator algorithm to optimize the movement of disk heads.


On modern magnetic disks, it is not possible to know the location of a particular data block on the disk platter, so such an algorithm will not do any useful work. Moreover, solid-state disks have in effect zero seek times, allowing you to scrap any complex algorithm or data structure that aims to minimize them.


The functionality of the buggy algorithm may be available in the library of the programming framework you’re using or it may be a mature third-party component. For example, the code for finding a container’s elements’ median value in O(n) time can contain many subtle bugs.


Replacing the C++ code with a call to std::nth_element is an easy way to fix such a flaw. As a larger-scale example, consider replacing a bug-infested proprietary data storage and query engine with a relational database.


A complex algorithm, implemented to improve performance, may have been overkill from day one. Performance optimizations are only justified when profiling and other measurements have demonstrated that optimization work on a particular code hotspot is actually required.


Programmers sometimes ignore this principle, gratuitously creating byzantine, overengineered code. This gives you the opportunity to do away with the code and the bug at the same time.


Modern user experience design favors much simpler interaction patterns than what was the case in the past. This may allow you to replace the buggy spaghetti code associated with a baroque dialog box full of tunable parameters with simpler code that supports a few carefully chosen options and many sensible defaults.


Things to Remember

  1. Selectively prune large code thickets in order to make the fault stand out.
  2. Break complex statements or functions into smaller parts so that you can monitor or test their function.
  3. Consider replacing a complex buggy algorithm with a simpler one.


Fix the Bug’s Cause, Rather Than Its Symptom

A surprisingly tempting way to fix a problem is to hide it under the carpet with a local fix. Here are some examples of a conditional statement “fix” being used:


To avoid a null pointer dereference

if (p != null) p.aMethod();
To sidestep the division by zero
if (nVehicleWheels == 0) return weight;
return weight / nVehicleWheels;
To shoehorn an incorrect number in a logical range
a = surfaceArea() if (a < 0)
a = 0;
To correct a truncated surname
if (surname.equals("Wolfeschlegelsteinha")) surname = "Wolfeschlegelsteinhausenbergerdorff";


Some of the preceding statements could have a reasonable explanation. If, however, the conditional was put into place merely to patch a crash, an exception, or an incorrect result, without understanding the underlying cause, then the particular fix is inexcusable. 


Coding around bugs is bad for many reasons.

The “fix,” by short-circuiting some functionality, may introduce a new more subtle bug.

By not fixing the underlying cause, other less obvious symptoms of the bug may remain, or the bug may appear again in the future under a different guise.


The program’s code becomes needlessly complex and thus difficult to understand and modify. The underlying cause becomes harder to find because the “fix” hides its manifestation—for example, the crash that could direct you to the underlying cause.


Things to Remember

  • Never code around a bug’s symptom: find and fix the underlying fault.
  • When possible, generalize complex cases rather than trying to fix special cases.


Examine Generated Code

Code often gets compiled through a series of transformations from one form to another until it finally reaches the form of the processor’s instructions. For example, a C or C++ file may first get preprocessed, then compiled into assembly language, which is then assembled into an object file;


Java programs are compiled into JVM instructions; lexical and parser generation tools, such as lex, flex, and bison, compile their input into C or C++. Various commands and options allow tapping into these transformations to inspect the intermediate code. This can provide you with valuable debugging intelligence.


If the resulting code is anything more than a few lines long, you’ll want to redirect the compiler’s output into a file, which you can then easily inspect in your editor. Here is a simple example demonstrating how you can pinpoint an error by looking at the preprocessed output. Consider the following C code.


#define PI 3.1415926535897932384626433832795; double toDegrees = 360 / 2 / PI; double toRadians = 2 * PI / 360;

Compiling it with the Visual Studio 2015 compiler produces the following (perhaps) cryptic error.

t.c(3) : error C2059: syntax error : '/'



Compile-Time Techniques

If you generate and look at the preprocessed code appearing below, you will see the semicolon before the slash, which will hopefully point you to the superfluous semicolon at the end of the macro definition.

#line 1 "t.c"

double toDegrees = 360 / 2 / 3.1415926535897932384626433832795;; double toRadians = 2 * 3.1415926535897932384626433832795; / 360;


This technique can be remarkably effective for debugging errors associated with the expansion of complex macros and definitions appearing in third-party header files. When, however, the expanded code is large (and it typically is), locating the line with the culprit code can be difficult.


One trick for finding it is to search for a non-macro identifier or a string appearing near the original line that fails to compile (e.g., toRadians in the preceding case). You can even add a dummy declaration as a signpost near the point that interests you.


Another way to locate the error involves compiling the preprocessed or otherwise generated code after removing the #line directives. The #line directives appearing in the preprocessed file allow the main part of the compiler to map the code it is reading to the lines in the original file.


The compiler can thus accurately report the line of the original (rather than the preprocessed) file where the error occurred. If, however, you’re trying to locate the error’s position in the preprocessed file in order to inspect it, having an error message point you to the original file isn’t what you want.


To avoid this problem, preprocess the code with an option directing the compiler not to output #line directives: -P on Unix systems, /EP for Microsoft’s compilers.


In other cases, it’s useful to look at the generated machine instructions. This can (again) help you get unstuck when you’re stumped by a silly mistake: you look at the machine instructions and you realize you’ve used the wrong operator or the wrong arithmetic type, or you forgot to add a brace or a break statement.


Through the machine code, you can also debug low-level performance problems. To list the generated assembly code, invoke Unix compilers with the -S option and Microsoft’s compilers with the /Fa option. If you use GCC and prefer to see Intel’s assembly syntax rather than the Unix one, you can also specify GCC’s option.


In languages that compile into JVM bytecode, run the javap command on the corresponding class, passing it the -c option. Although assembly code appears cryptic, if you try to map its instructions into the corresponding source code, you can easily guess most of what’s going on, and this is often enough.


Configure Deterministic Builds and Executions

The following program prints the memory addresses associated with the program’s stack, heap, code, and data.

#include <stdio.h>
#include <stdlib.h>
int z;
int i = 1;
const int c = 1;
main(int argc, char *arg[])
printf("stack:\t%p\n", (void *)&argc);
printf("heap:\t%p\n", malloc(1));
printf("code:\t%p\n", (void *)main);
printf("data:\t%p (initialized)\n", (void *)&i);
printf("data:\t%p (constants)\n", (void *)&c);
printf("data:\t%p (zero)\n", (void *)&z);
return 0;
On many environments, each run will produce different results. (I’ve seen this happening with GCC under GNU/Linux, clang under OS X, and Visual C under Windows.)
stack: 003AFDF4
heap: 004C2200
code: 00CB1000
data: 00CBB000 (initialized)
data: 00CB8140 (constants)
data: 00CBCAC0 (zero)
stack: 0028FC68
heap: 00302200
code: 01331000
data: 0133B000 (initialized)
data: 01338140 (constants)
data: 0133CAC0 (zero)

This happens because the operating system kernel randomizes the way the program is loaded into memory in order to hinder malicious attacks against the code. Many so-called code injection attacks work by over-flowing a program’s buffers with malicious code and then tricking the program being attacked into executing that code.


This trick is quite easy to pull off if a vulnerable program’s elements are always located at the same memory position. As a countermeasure, some kernels randomize a program’s memory layout, thereby foiling malicious code that attempts to use hard-coded memory addresses.


Unfortunately, this address space layout randomization (ASLR) can also interfere with your debugging. Failures that happen on one run may not occur in another one; the values of pointers you painstakingly record change when you restart the program; address-based hash tables get filled in a different way; some memory managers may change their behavior from one run to another.


Therefore, ensure that your program stays stable between executions, especially when debugging a memory-related problem. On GNU/Linux, you can disable ASLR by running your program as follows.

setarch $(uname -m) -R myprogram


Finally, on OS X you need to pass the -no_pie option to the linker, through the compiler’s -Wl flag. This is the incantation you’ll need to use when compiling.

-Wl,-no_pie -o myprogram myprogram.c


There are other, thankfully less severe, ways through which two builds of the same program may differ. Here are some representative ones. Unique chose symbol names that GCC places into each compiled file.


Varying order of compiler inputs. If the files to be compiled or linked are derived from a Makefile wildcard expansion, their order can differ as directory entries get reshuffled. Specify the inputs explicitly, or sort the wildcard’s expansion. 


Timestamps embedded in the code to convey the software’s version, through the __DATE__ and __TIME__ macros, for example. Use the revision control system version identifier (e.g., Git’s SHA sum) instead. This will allow you to derive the timestamp should you even need it.


Lists generated from hashes and maps. Some programming language implementations vary how objects are hashed in order to thwart algorithmic complexity attacks.


Encryption salt. Encryption programs typically perturb the provided key through a randomly derived value—the so-called salt— in order to thwart prebuilt dictionary attacks.


Disable the salting when testing and debugging; However, do not use this option for production purposes as it will make your system vulnerable to dictionary attacks.


The golden standard for build image consistency is to be able to create bit-identical package distributions by compiling the same source code on different hosts. This requires a lot more work because it also involves sanitizing things such as file paths, locales, archive metadata, environment variables, and time zones.


If you need to go that far, consult the reproducible builds website, which offers sound advice on how to tackle these problems.


A number of compilation and linking options allow your code and libraries to perform more stringent runtime checks regarding their operation. These options work in parallel with those that configure your own software’s debug mode.


which you should also enable at compile time. The options you’ll see here mainly apply to C, C++, and Objective-C, which typically avoid the performance penalty of buffer bounds checking. Consequently, when these checks are enabled, programs may run noticeably slower.


Therefore, you must apply these methods with care in real-time systems and performance-critical environments. In the following paragraphs, you’ll see some common ways in which you can configure the compilation or linking of your code to pinpoint bugs associated with the use of memory.


You can enable a number of checks on software using the C++ standard template library. With the GNU implementation, you need to define the macro _GLIBCXX_DEBUG when compiling your code, whereas under Visual Studio the checks are enabled if you build your project under debug mode or if you pass the option /MDd to the compiler.


Builds with STL checks enabled will catch things such as incrementing an iterator past the end of a range, dereferencing an iterator of a container that has been destructed, or violating an algorithm’s preconditions.


The GNU C library allows you to check for memory leaks—allocated memory that isn’t freed over the program’s lifetime. To do that, you need to call the trace function at the beginning of your program, and then run it with the environment variable MALLOC_TRACE set to the name of the file where the tracing output will go.


Consider the following program, which at the time it exists still has an allocated memory block.


Depending on the compiler you’re using, you may need to provide some extra information in order to get an error report in terms of your source code rather than machine code addresses.


AddressSanitizer is supported on a number of systems, including GNU/ Linux, OS X, and FreeBSD running on i386 and x86_64 CPUs, as well as Android on ARM and the iOS Simulator. AddressSanitizer imposes a significant overhead on your code: it roughly doubles the amount of memory and processing required to run your program.


On the other hand, it is not expected to produce false positives, so using it while testing your software is a trouble-free way to locate and remove many memory-related problems.


The facilities used for detecting memory allocation and access errors in Visual Studio are not as advanced as AddressSanitizer, but they can work in many situations. 


Keep in mind that the provided facilities can only identify writes that happen just outside the allocated heap blocks. In contrast to Address-Sanitizer, they cannot identify invalid read operations, nor invalid accesses to global and stack memory.


An alternative approach to use under OS X and when developing iOS applications involves linking with the Guard Malloc library. This puts each allocated memory block into a separate (non-consecutive) virtual memory page, allowing the detection of memory accesses outside the allocated pages.


The approach places significant stress on the virtual memory system when allocating the memory but requires no additional CPU resources to check the allocated memory accesses. It works with C, C++, and Objective-C. To use the library, set the environment variable


Several additional environment variables can be used to fine-tune its operation; As an example, the following program, which reads outside an allocated memory block, terminates with a segmentation fault when linked with the library.

int *a = new int [5];
int t = a[10];
return 0;


You can easily catch the fault with a debugger in order to pinpoint the exact location associated with the error.

Finally, if none of these facilities are available in your environment, consider replacing the library your software is using with one that supports debug checks. One notable such library is malloc, a drop-in replacement for the C memory allocation functions with debug support.


Things to Remember

  1. Identify and enable the runtime debugging support offered by your environment’s compiler and libraries.
  2. If no support is available, consider configuring your software to use third-party libraries that offer it.


 Runtime Techniques

The ultimate source of truth regarding a program is its execution. While a program is running, everything comes to light: its correctness, its CPU and memory utilization, even its interactions with buggy libraries, operating systems, and hardware. Yet, typically, this source of truth is also fleeting, rushing into oblivion at the tune of billions of instructions per second.


Worse, capturing that truth can be a tricky, tortuous, or down-right treacherous affair. Tests, application logs, and monitoring tools allow you to peek into the program’s runtime behavior to locate the bug that’s bothering you.


 Find the Fault by Constructing a Test Case

You can often pinpoint and even correct a bug simply by working on appropriate tests. Some call this approach DDT for “Defect-Driven Testing”—it is no coincidence that the abbreviation matches that of the well-known insecticide. Here are the three steps you need to follow, together with a running example.


The example is based on an actual bug that appeared in qmcalc, a program that calculates diverse quality metrics for C files and displays the values corresponding to each file as a tab-separated list. The problem was that, in some rare cases, the program would output fewer than the 110 expected fields.


First, create a test case that reliably reproduces the problem you need to solve. This means specifying the process to follow and the required materials (typically data).


For example, a test case can be that loading file for (material) and then pressing x, y, and z causes the application to crash (process). Another could be that putting Acme’s load balancer (material) in front of your application causes the initial user authentication to fail (process).


In the example case, the following commands apply the qmcalc program on all Linux C files and generate a summary of the number of fields generated.

# Find all C files
find linux-4.4 -name \*.c |
Apply qmcalc on each file xargs qmcalc |
Display the number of fields awk '{print NF}' |
Order by number of fields sort |
Display number of occurrences uniq -c


The second step is the simplification of the test case to the bare minimum. Both methods for doing that, building up the test case from scratch or trimming down the existing large test case, involve an Aha! the moment, where the bug first appears (when building up) or disappears (when trimming down).


The test case data will often point you either to the problem or even to the solution. In many cases you can combine both trimming methods: you first remove as much fat as possible, and, once you think you know what the problem is, you construct a new minimal test case from scratch.


The third step involves consolidating your victory. Having isolated the problem, grab the opportunity to add a corresponding unit test or regression test in the code.


If the failure is associated with a fault in isolated code parts, then you should be able to add a corresponding unit test. If the failure occurs through the combination of multiple factors, then a regression test is more appropriate.


The regression test should package your test case in a form that can be executed automatically and routinely when the software is tested. While the fault is still in the code, run the software’s tests to verify that the test fails and that it therefore correctly captures the problem.


When the test passes, you have a pretty good indication that you’ve fixed the code. In addition, the test’s existence will now ensure that the fault will not resurface again in the future.


To put this in the words of Andrew Hunt and David Thomas: “Coding ain’t done ‘til all the tests run.”


Adding to your code a test for a problem that’s already solved is not as pedantic as it sounds. First, you may have missed fixing a particular case; the test will help you catch that problem when that code is exercised.


Then, an incorrectly handled revision merge conflict may introduce the same error again. In addition, someone else can commit a similar error in the future. Finally, the test may also catch related errors. There’s rarely a good reason to skimp on tests.


When using tests to uncover bugs, it’s worthwhile to know which parts of the code are actually tested and which parts are skipped over because bugs may lurk in the less well-tested parts. You can find this through the use of a tool that performs test coverage analysis.


Examine Application Log Files

Many programs that perform complex processing, execute in the background, or lack access to a console, log their operations to a file or a specialized log collection facility. A program’s log output allows you to follow its execution in real time or analyze a sequence of events at your own convenience.


In the case of a failure, you may find in the log either an error or a warning message indicating the reason behind the failure (e.g., “Unable to connect to Example Domain (Example Domain): Connection refused”) or data that point to a software error or misconfiguration. Therefore, make it a habit to start the investigation of a failure by examining the software’s log files.


The location and storage method of log files differ among operating systems and software platforms. On Unix systems, logs are typically stored in text files located in the /var/log directory.


Applications may create in that directory their own log files, or they may use an existing file associated with the class of events they’re logging. Some examples include

  1. Authentication: auth.log
  2. Background processes: daemon.log
  3. The kernel: kern.log
  4. Debug information: debug


Other messages: messages

On a not-so-busy system, you may be able to find the file corresponding to the application you’re debugging by running

ls -tl /var/log | head

right after the application creates a log entry; the name of the log should appear at the top among the most recently modified files. If the file is located in a subdirectory of /var/log, you may be able to find it as follows:

List all files under /var/log find /var/log -type f |
List each file's last modification time and name xargs stat -c '%y %n' |
Order by time
sort -r |
List by the ten most recently modified files head


If these methods do not work, you can look for the log filename in the application’s documentation, a trace of its execution or its source code.


On Windows systems, the application logs are stored in an opaque format. You can run Eventvwr.msc to launch the Event Viewer GUI application, which allows you to browse and filter the logs, use the Windows PowerShell GetEventLog command, or use the corresponding .NET API.


Again, logs are separated into various categories; you can explore them through the tree appearing on the left of the Event Viewer. On OS X, the GUI log viewer application is named Console. Both the Windows and the OS X applications allow you to filter the logs, create custom views, or search for specific entries. You can also use the Unix command-line tools to perform similar processing.


Many applications can adjust the amount of information they log (the so-called log verbosity) through a command-line option, a configuration option, or even at runtime by sending them a suitable signal. Moreover, logging frameworks provide additional mechanisms for expanding or throttling log messages.


When you have debugged your problem, don’t forget to reset logging to its original level; extensive logging can hamper performance and consume excessive storage space or bandwidth.


On Unix systems, applications tag every log message with the associated facility (e.g., authorization, kernel, mail, user) and a level, ranging from emergency and alert to informational and debug. 

There you can specify that log messages up to a maximum level (e.g., all messages up to informational, but not the debug ones) and associated with a given facility will be logged to a file, sent to the console, or ignored.


For example, the following specifies the files associated with all messages of the security facility, authorization up to the informational level, and all messages of exactly the debug level. It also specifies that messages at the emergency level are sent to all logged-in users.

security.* /var/log/security
http://auth.info (http://auth.info) /var/log/auth.log
*.=debug /var/log/debug.log
*.emerg *


For JVM code, the popular Apache log4j logging framework allows the even more detailed specification of what gets logged and where.


Its structure is based on loggers (output channels), appenders (mechanisms that can send log messages to a sink, such as a file or a network socket), and layouts (these specify the format of each log message).


Log4j is configured through a file, which can be given in XML, JSON, YAML, or Java properties format. Here is a small part of the log4j configuration file used by the Rundeck workflow and configuration management system.


By adjusting the level of messages that get logged, you can often get the data needed to debug a problem. Here is an example associated with debugging failing ssh connections. There, a commented-out line specifies the default log level (INFO).


Bumping up the log level to DEBUG

results in many more informative messages, one of which clearly indicates the problem’s cause.

Jul 30 12:57:07 prod sshd[5713]: debug1: Could not open authorized keys '/home/jhd/.ssh/authorized_keys': No such file or directory


There are several ways to analyze a log record in order to locate a failure’s cause.

  1. You can use the system’s GUI event viewer and its searching and filtering facilities.
  2. You can open and process a log file in your editor.
  3. You can filter, summarize, and select fields using Unix tools.
  4. delete-matching-lines
  5. You can monitor the log interactively.
  6. You can use a log management application or service, such as ELK, Logstash, logging, or Splunk.
  7. Under Windows, you can use the Windows Events Command Line Utility to run queries and export logs.


You typically want to start by examining the entries near the time when the failure occurred. Alternately, you can search the event log for a string associated with the failure, for example, the name of a failed command. In both cases, you then scan the log back in time looking for errors, warnings, and unexpected entries.


For failures lacking a clear manifestation, it’s often useful to repeatedly remove from a log file innocuous entries until entries containing important information stand out. You can do this within your editor: under


Emacs use regular-expression; under vim use

:g/regular-expression/d; under Eclipse and Visual Studio find-replace a regular expression that matches the whole line and ends with \n. On the Unix command line, you can pipe the log file through successive grep -v commands.

Things to Remember

  1. Begin the investigation of a failing application by examining its log files.
  2. Increase an application’s logging verbosity to record the reason for its failure.


Analyze Debug Data with Unix Command-Line Tools

When you’re debugging you’ll encounter problems no one has ever seen before. Consequently, the shiny IDE that you’re using for writing software may lack the tools to let you explore the problem in detail with sufficient power.


This is where the Unix command tools come in. Being general-purpose tools, which can be combined into sophisticated pipelines, they allow you the effortless analysis of text data.


Line-oriented textual data streams are the lowest useful common denominator for a lot of data that passes through your hands.


Such streams can be used to represent many types of data you encounter when you’re debugging, such as program source code, program logs, version control history, file lists, symbol tables, archive contents, error messages, test results, and profiling figures.


For many routines, every-day tasks, you might be tempted to process the data using a powerful Swiss Army knife scripting language, such as Perl, Python, Ruby, or the


Windows PowerShell. This is an appropriate method if the scripting language offers a practical interface to obtain the debug data you want to process and if you’re comfortable to develop the scripting command in an interactive fashion.


Otherwise, you may need to write a small, self-contained program and save it into a file. By that point, you may find the task too tedious, and end up doing the work manually, if at all. This may deprive you of important debugging insights.


Often, a more effective approach is to combine programs of the Unix tool chest into a short and sweet pipeline that you can run from your shell’s command prompt. With the modern shell command-line editing facilities, you can build your command bit by bit, until it molds into exactly the form that suits you.


In this, you’ll find an overview of how to process debug data using Unix commands. If you’re unfamiliar with the command-line basics and regular expressions, consult an online tutorial. Also, you can find the specifics on each command’s invocation options by giving its name as an argument to the man command.


Depending on the operating system you’re using, getting to the Unix command line is trivial or easy. On Unix systems and OS X, you simply open a terminal window.


On Windows, the best course of action is to install Cygwin: a large collection of Unix tools and a powerful package manager ported to run seamlessly under Windows. Under OS X, the Homebrew package manager can simplify the installation of a few tools described here that are not available by default.


Many debugging one-liners that you’ll build around the Unix tools follow a pattern that goes roughly like this: fetching, selecting, processing, and summarizing.


You’ll also need to apply some plumbing to join these parts into a whole. The most useful plumbing operator is the pipeline (|), which sends the output of one processing step as input to the next one.


Most of the time your data will be text that you can directly feed to the standard input of a tool. If this is not the case, you need to adapt your data.

For example, if your C or C++ program exits unexpectedly, you can run nm on its object files to see which ones call (import) the exit function.

List symbols in all object files prefixed by file name nm -A *.o |

List lines ending in U exit

grep 'U exit$'


If you’re working with files grouped into an archive, then a command such as tar, jar, or ar will list you the archive’s contents. If your data comes from a (potentially large) collection of files, the find command can locate those that interest you.


On the other hand, to get your data over the web, use curl or wget. You can also use dd (and the special file /dev/zero), yes or jot to generate artificial data, perhaps for running a quick benchmark.


Finally, if you want to process a compiler’s list of error messages, you’ll want to redirect its standard error to its standard output or to a file; the incantations 2>&1 and 2>filename will do this trick.


As an example, consider the case in which you’ve changed a function’s interface and want to edit all the files that are affected by the change. One way to obtain a list of those files is the following pipeline.


Attempt to build all affected files redirecting standard error

to standard output
make -k 2>&1 |
# Print name of file where the error occurred
awk -F: '/no matching function for call to Myclass::myFunc/
{ print $1}' |
List each file only once sort -u


Given the generality of log files and other debugging data sources, in most cases, you’ll have on your hands more data than what you require. You might want to process only some parts of each row or only a subset of the rows.


To select a specific column from a line consisting of fixed-width fields or elements separated by space or another field delimiter, use the cut command. If your lines are not neatly separated into fields, you can often write a regular expression for a sed substitute command to isolate the element you want.


The workhorse for obtaining a subset of the rows is grepped. Specify a regular expression to get only the rows that match it, and add the -v flag to filter out rows you don’t want to process.

grep -r ' / ' . |

grep -v '/ sizeof'


Use fgrep (grep for fixed strings) with the -f flag if the elements you’re looking for are plain character sequences rather than regular expressions and if they are stored into a file (perhaps generated in a previous processing step). If your selection criteria are more complex, you can often express them in an awk pattern expression.


Many times you’ll find yourself combining a number of these approaches to obtain the result that you want. For example, you might use grep to get the lines that interest you, grep -v to filter out some noise from your sample, and finally awk to select a specific field from each line.


For example, the following sequence processes system trace output lines to display the names of all successfully opened files.

Output lines that call open grep '^open(' trace.out |

Remove failed open calls (those that return -1) grep -v '= -1' |

Print the second field separated by quotes

awk -F\" '{print $2}'

You’ll find that data processing frequently involves sorting your lines on a specific field. The sort command supports tens of options for specifying the sort keys, their type, and the output order.


Once your results are sorted, you then efficiently count how many instances of each element you have. The uniq command with the -c option will do the job here; often you’ll post-process the result with another sort, this time with then flag specifying a numerical order, to find out which elements appear most frequently.


In other cases, you might want to compare results between different runs. You can use diff if the two runs generate results that should be the same (perhaps the output of a regression test) or comm if you want to compare two sorted lists. You’ll handle more complex tasks, again, using Awk.


As an example, consider the task of investigating a resource leak. A first step might be to find all files that directly call obtain Resource but do not include any direct calls to releaseResource. You can find this through the following sequence.


List records occurring only in the first set comm -23 <(

List names of files containing obtain resource grep -rl obtainResource. | sort) <(

List names of files containing releaseResource grep -rl releaseResource. | sort)

(The $(...) sequence is an extension of the bash shell that provides a file-like argument supplying, as input, the output of the process within the brackets.)


In many cases, the processed data is too voluminous to be of use. For example, you might not care which log lines indicate a failure, but you might want to know how many there are. Surprisingly, many problems involve simply counting the output of the processing step using the humble wc (word count) command and its -l flag.


If you want to know the top or bottom 10 elements of your result list, then you can pass your list through head or tail. Thus, to find the people most familiar with a specific file (perhaps in your search for a reviewer), you can run the following sequence.


List each line's last modification git blame --line-porcelain Foo.java |

Obtain the author
grep '^author ' |
Sort to bring the same names together sort |
Count by number of each name's occurrences uniq -c |
Sort by number of occurrences
sort -rn |


List the top one's head

The tail command is particularly useful for examining log files. Also, to examine your voluminous results in detail, you can pipe them through more or less; both commands allow you to scroll up and down and search for particular strings.


As usual, use awk when these approaches don’t suit you; a typical task involves summing up a specific field with a command such as a sum += $3.


For example, the following sequence will process a web server log and display the number of requests and an average number of bytes transferred in each request.

awk '
When the HTTP result code is a success (200)
sum field 10 (number of bytes transferred)
Blog 3 General-Purpose Tools and Techniques
$9 == 200 {sum += $10; count++}
# When input finishes, print count and average
END {print count, sum / count}' /var/log/access.log


All the wonderful building blocks of Unix are useless without some way to glue them together. For this, you’ll use the Bourne shell’s facilities.


So common is this pattern, that in order to handle files with embedded spaces in them (such as the Windows “Program Files” folder), both commands support an argument (-print0 and -0) to have their data terminated with a null character, instead of a space.


As an example, consider the task of finding the log file created after you modified foo.cpp that contains the largest number of occurrences of the string “access failure.” This is the pipeline you would write.


Find all files in the /var/log/acme folder that were modified after changing foo.cpp

find /var/log/acme -type f -cnewer ~/class='lazy' data-src/acme/foo.cpp -print0 |
Apply fgrep to count the number of 'access failure' occurrences xargs -0 fgrep -c 'access failure' |
Sort the :-separated results in reverse numerical order
according to the value of the second field
sort -t: -rn -k2 |
Print the top result head -1


If your processing is more complex, you can always pipe the arguments into a while read loop (amazingly, the Bourne shell allows you to pipe data into and from all its control structures).


For instance, if you suspect that a problem is related to an update of a system’s dynamically linked library (DLL), through the following sequence you can obtain a listing with the version of all DLL files in the windows/system32 directory.

# Find all DLL files
find /cygdrive/c/Windows/system32 -type f -name \*.dll |
For each file
while read f ; do
Obtain its Windows path with escaped \
wname=$(cygpath -w $f | sed 's/\\/\\\\/g')
Run WMIC query to get its name and version
wmic datafile where "Name=\"$wname\"" get name, version done |

Remove headers and blank lines grep windows

When everything else fails, don’t shy away from using a couple of intermediate files to juggle your data.


Things to Remember

  • Analyze debug data through Unix commands that can obtain, select, process, and summarize textual records.
  • By combining Unix commands with a pipeline, you can quickly accomplish sophisticated analysis tasks.


Utilize Command-Line Tool Options and Idioms

The beauty of performing a textual search with grep is that this will work irrespective of the programming language of the code that produces the error message.


This is particularly useful in applications written in multiple languages, or when you lack the time to set up a project within an IDE. Note that the (-r) fgrep option is a GNU extension, which purists find distasteful. If you’re working on a system lacking this facility, the following pipeline will perform exactly the same task.

find . -type f | xargs fgrep -l 'Missing foo'


Often the data you’re examining contain a lot of noise: s you don’t want to see. Although you could tailor a grep regular expression to select the records you want, in many cases, it’s easier to simply discard the records that bother you using the -v argument of the grep command. Particularly powerful is the combination of multiple such commands.


For example, to obtain all the log records that include the string “Missing foo” but do not contain “connection failure” or “test,” you can use a pipeline such as the following:

fgrep 'Missing foo' *.log |

fgrep -v 'connection failure' |

fgrep -v test


The output of the grep command are lines that match the specified regular expression. However, if those lines are long, it may be difficult to easily see the part of the line where the culprit occurs.


For example, you might believe that a display problem associated with a (badly formatted) HTML file has to do with a table tag. How can you quickly inspect all such tags? Passing the --color option to grep, as in grep --color table file.html will show all the table tags in red, simplifying their inspection.


By convention, programs that run on the command line do not send errors to their standard output. Doing that might confuse other programs that process their output and also hide the error message from the program's human operator if the output is redirected into a file.


Instead, error messages are sent to a different channel, called the standard error. This will typically appear on the terminal through which the command was invoked; even its output was redirected.


However, when you’re debugging a program you might want to process that output, rather than see it fly away on the screen. Two redirection operators can help you here. First, you can send the standard error (by convention, file descriptor 2) into a file for later processing by specifying 2>filename when running the program.


You can also redirect the standard error to the same file descriptor as the standard output (file descriptor 1), so that you can process both with the same pipeline. For example, the following command passes both outputs through more, allowing you to scroll through the output at your own pace.


program 2>&1 | more

When debugging non-interactive programs, such as web servers, all the interesting action is typically recorded into a log file. Rather than repeatedly viewing the file for changes, the best thing to do is to use the tail command with the -f option to examine the file as it grows.


The tail command will keep the log file open and register an event handler to get notifications when the file grows. This allows it to display the log file’s changes in an efficient manner.


If the process of writing the file is likely at some point to delete or rename the log file and create a new one with the same name (e.g., to rotate its output), then passing the --follow=name to tail will instruct tail to follow the file with that name rather than the file descriptor associated with the original file.


Once you have tail running on a log file, it pays to keep that on a separate (perhaps small) window that you can easily monitor as you interact with the application you’re debugging. If the log file contains many irrelevant lines, you can pipe the tail output into grep to isolate the messages that interest you.

sudo tail /var/log/maillog | fgrep 'max connection rate'


If the failures you’re looking for are rare, you should set up a monitoring infrastructure to notify you when something goes wrong. For one-off cases, you can arrange to run a program in the background even after you log off by suffixing its invocation with an ampersand and running it with the nohup utility. You will then find the program’s output and errors in a file named nohup.out.


Or you can pipe a program’s output to the mail command so that you will get it when it finishes. For runs that will terminate within your workday, you can set a sound alert after the command. long-running-regression-test ; printf '\a'


You can even combine the two techniques to get an audible alert or a mail message when a particular log line appears.

sudo tail -f /var/log/secure |
fgrep -q 'Invalid user' ; printf '\a'
sudo tail -f /var/log/secure |
fgrep -m 1 'Invalid user' |
mail -s Intrusion jdh@example.com


Modifying the preceding commands with the addition of a while read loop can make the alert process run forever. However, such a scheme enters into the realm of an infrastructure monitoring system for which there are specialized tools.

Things to Remember

  1. Diverse grep options can help you narrow down your search.
  2. Redirect a program’s standard error in order to analyze it.
  3. Use tail -f to monitor log files as they grow.