20+ Debugging Tools (2019)

Debugging Tools

Debugging Tools (2019)

A number of specialized tools are available for debugging the code. In this blog, we explain 20+ best debugging tools that maximize the code performance after removing all the bugs. 

 

Monitoring and Tracing tools

Monitoring and tracing tools and facilities allow you to derive log-like data from the execution of arbitrary programs. This approach offers you a number of advantages over application-level logging.

  1. You can obtain data even if the application you’re debugging lacks logging facilities.
  2. You don’t need to prepare a debug version of the software which may obfuscate or hide the original problem.
  3. Compared to the use of a GUI debugger, it’s lightweight, which allows you to use it in a bare-bones production environment.

 

When you try to locate a bug, an approach you often use involves either inserting logging statements in key locations of your program running the code under a debugger, which allows you to dynamically insert breakpoint instructions.

 

Nowadays, however, performance problems and many bugs involve the use of third-party libraries or interactions with the operating system.

 

Configure and filter log files to narrow down the problem.

When debugging performance problems, your first (and often the only) port of call is a profile of the system’s operation. This will analyze the system’s resource utilization and thereby point to a part that is misbehaving or needs to be optimized.

 

Start by obtaining a high-level overview. Two process viewing tools that will also give you a system’s CPU and memory utilization are the top commands on Unix systems and the Task Manager on Window.

 

On a misbehaving system, a high level of CPU utilization (say, 90% on a single core CPU) tells you that you must concentrate your analysis on processing, whereas a low utilization (say, 10%, again on a single core CPU) points to delays that may be occurring due to input/output (I/O) operations.

 

Note that multi-core computers typically report the load over all CPU cores, so if you’re dealing with one single-threaded process, divide the thresholds I gave by the total number of available CPU cores. As an example, so a single process occupying 100% of a CPU core on an otherwise idle system will make the system load appear as 12% (100%=8).

 

Also, look at the system’s physical memory utilization. A high (near 100%) utilization may cause errors due to failed memory allocations or a drop in the system’s performance due to virtual memory paging.

 

When looking at the amount of free memory, keep in mind that Linux systems aggressively use almost all available memory as a buffer cache. Therefore, on Linux systems add the memory listed as buffers to the amount of memory you consider to be free.

 

For systems designed to operate normally near their maximum capacity, you need to look beyond utilization (which will be close to 100%) and examine saturation.

 

This is a measure of the demands placed on a resource above the level it can service. For this, you will use the same tools but focus on measures indicating the saturation of each resource.

  1. For CPUs, look at a load higher than the number of cores on Unix systems and at the Performance Monitor System – Processor Queue Length on Windows systems.
  2. For memory, look at the rate at which virtual memory pages are written out to disk.
  3. For the network, I/O, look for dropped packets and retransmissions.
  4. For storage, I/O, look at the request queue length and operation latency.
  5. For all of the above measures, levels of saturation consistently appearing above 100% (continuously or in bursts) are typically a problem.

 

Having obtained an overview of what’s gumming up your system’s performance, drill down toward the process burning many CPU cycles, causing excessive I/O, experiencing high I/O latency, or using a lot of memory. 

 

If the problem is a high CPU load, look at the running processes. Order the processes by their CPU usage to find the culprit taking up most of the CPU time. 

 

If the problem is high memory utilization, look at the running processes ordered by the working set (or resident) memory size. This is the amount of physical (rather than virtual) memory used.

 

Once you’ve isolated the type of load that’s causing you a problem, use pidstat on Unix or the Windows Task Manager to pinpoint the culprit process. Then trace the individual process's system calls to further understand its behavior.

 

For cases of high CPU or memory utilization, you should continue by profiling the behavior of the culprit process you’ve identified. There’s no shortage of techniques for monitoring a program’s behavior.

 

If you care about CPU utilization, you can run your program under a statistical profiler that will interrupt its operation many times every second and note where the program spends most of its time.

 

Alternately, you can arrange for the compiler or runtime system to plant code setting a time counter at the beginning and end of each (non-inlined) function, and thus create a graph-based profile of the program’s execution.

 

This allows you to attribute the activity of each function to its parents, and thereby untangle performance problems associated with complex call paths. 

 

Other profiling options you can use include the Eclipse and NetBeans profiler plugins and the stand-alone VisualVM, JProfiler, and Java Mission Control systems for Java programs, as well as the CLR profiler for the .NET code.

 

Memory utilization monitors typically modify the runtime system’s memory allocator to keep track of your allocations. Valgrind under Unix and, again, VisualVM and Java Mission Control are useful tools in this category. Aspect Oriented Programming tools and frameworks, such as AspectJ and Spring AOP, allow you to orchestrate your own custom monitoring.

 

At an even lower level, you can monitor the CPU’s performance counters with tools such as perf, profile, or perfmon2 to look for cache misses, missed branch predictions, or instruction fetches stalls.

 

Things to Remember

  1. Analyze performance issues by looking at the levels of CPU, I/O, and memory utilization and saturation.
  2. Narrow down on the code associated with a performance problem by profiling a process’s CPU and memory usage.

One way to resolve these issues is to look at the calls from your code to that other component. By examining the timestamp of each call or looking for an abnormally large number of calls, you can pinpoint performance problems.

 

Tracing tools

The arguments to a function can also often reveal a bug. Call tracing tools include trace (traces library calls), strace, ktrace, and truss (these trace operating system calls) under Unix, JProfile for Java programs, and Process Monitor under Windows (this traces DLL calls, which involve both operating system and third-party library interfaces).

 

These tools typically work by using special APIs or code-patching techniques to hook themselves between your program and its external interfaces.

 

With a compact and reliable way to reproduce the problem, it was easy to write a shim class that would independently calculate and cache the file’s offset, eliminating the calls to tellg.

 

Processing the output of strace with Unix tools immensely increases your debugging power. Consider the case where a program fails by complaining about an erroneous configuration entry.

 

However, you can’t find the offending string in any of its tens of configuration files. The following Bash command will show you which of the files opened by the program prog contains the offending string, say, xyzzy.

 

It works by sending the output of strace into a pipeline that isolates the names of files passed to the open system call (sed), removes the file-names associated with devices, keeps a unique copy of each filename (sort -u), and looks for the string xyzzy within those files.

 

Looking at the system calls of Java and X Window System programs can be irritating because of this issue a large number of calls associated with the runtime framework. These calls can obscure what the program actually does. Thankfully, you can filter out these system calls with the strace -e option.

 

Note that you can also trace an already running program by attaching the tracing tool to it. The command-line tools offer the -p option, whereas the GUI tools allow you to click on the process you want to trace.

 

System and library call tracing is not the only game in town. Most interpreted languages offer an option to trace a program’s execution. Here are the incantations for tracking code written in some popular scripting languages.

Perl: perl -d:Trace
Python: python -m trace --trace
Ruby: ruby -r tracer
Unix shell: sh -x, bash -x, csh -x, etc.

 

Other ways to monitor a program’s operation include the JavaScript tracing backend spy-js, network packet monitoring and the logging of an application's SQL statements, via the database server. For example, the following SQL statements can turn on this logging for MySQL.

set global log_output = 'FILE';

set global general_log_file='/tmp/mysql.log'; set global general_log = 1;

 

Most of the tools referred to so far have been around for ages and can be valuable for solving a problem, once you’ve located its approximate cause.

 

They also have a number of drawbacks: they often require you to take special actions to monitor your code, they can decrease the performance of your system, their interfaces are idiosyncratic and incompatible with each other, each one shows us only a small part of the overall picture, and sometimes important details are simply missing.

 

A tool that addresses this shortcoming is DTrace, a dynamic tracing framework developed originally by Sun that provides a uniform mechanism for monitoring comprehensively and unobtrusively the operating system, application servers, runtime environments, libraries, and application programs.

 

It is currently available on Solaris, OS X, FreeBSD, and NetBSD. On Linux, SystemTap and LTTng offer similar facilities.

 

Unsurprisingly, DTrace, a gold winner in The Wall Street Journal’s Technology Innovation Awards contest, is not a summer holiday hack.

 

The three Sun engineers behind it worked for a number of years to develop mechanisms for safely instrumenting all operating system kernel functions, any dynamically linked library, any application program function or specific CPU instruction, and the Java virtual machine.

 

They also developed a safe interpreted language that you can use to write sophisticated tracing scripts without damaging the operating system’s functioning, and aggregating functions that can summarize traced data in a scalable way without excessive memory overhead.

 

DTrace Tool

DTrace integrates technologies and wizardry from most existing tracing tools and some notable interpreted languages to provide an all-encompassing platform for program tracing.

 

You typically use the DTrace framework through the dtrace command-line tool. You feed the DTrace tool with scripts you write in a domain-specific language named D (not related to the general-purpose language with the same name).

 

When you run dtrace with your script, it installs the traces you’ve specified, executes your program, and prints its results. D programs can be very simple: they consist of pattern/action pairs like those found in the awk and sed tools and many declarative languages.

 

A pattern (called a predicate in the DTrace terminology) specifies a probe—an event you want to monitor. DTrace comes with thousands of pre-defined probes (49,979 on an early version of Solaris and 177,398 on OS X El Capitan I tried it on).

 

In addition, system programs (such as application servers and runtime environments) can define their own probes, and you can also set a probe anywhere you want in a program or in a dynamically linked library. For example, the command

 

dtrace -n 'syscall:::entry'

will install a probe at the entry point of all operating system calls, and the (default) action will be to print the name of each system call executed and the process-id of the calling process. You can combine predicates and other variables together using Boolean operators to specify more complex tracing conditions.

 

The name syscall in the previous invocation specifies a provider—a module providing some probes. Predictably, the syscall provider provides probes for tracing operating system calls; 500 system calls on my system. The name syscall::open: entry designates one of these probes—the entry point to the open system call.

 

DTrace contains tens of providers, giving access to statistical profiling, all kernel functions, locks, system calls, device drivers, input and output events, process creation and termination, the network stack’s management information bases (MIBs), the scheduler, virtual memory operations, user program functions, and arbitrary code locations, synchronization primitives, kernel statistics, and Java virtual machine operations.

 

Here are the commands you can use to find the available providers and probes.

List all available probes dtrace -l

List system call probes dtrace -l -P syscall

List the arguments to the read system call probe dtrace -lv -f syscall::read

 

Together with each predicate, you can define an action. This action specifies what DTrace will do when a predicate’s condition is satisfied. For example, the following command

dtrace -n 'syscall::open:entry {trace(copyinstr(arg0));}'

will list the name of each opened file.

 

Actions can be arbitrarily complex: they can set global or thread-local variables, store data in associative arrays, and aggregate data with functions such as count, min, max, avg, and quantize.

 

For instance, the following program will summarize the number of times each process gets executed over the lifetime of the DTrace invocation.

proc:::exec-success { @proc[execname] = count()}

 

By tallying functions that acquire resources and those that release them, you can easily debug leaks of arbitrary resources. In typical use, DTrace scripts span the space from one-liners, such as the preceding ones, to tens of lines containing multiple predicate action pairs.

 

If your code runs on a JVM, another tool you might find useful for tracking its behavior is Byteman. 

 

This can inject Java code into the methods of your application of the runtime system, without requiring you to re-compile the code. You specify when and how the original Java code is transformed through a clear and simple scripting language.

 

The advantages of using Byteman over adding logging code by hand are threefold. First, you don’t need access to the source code, which allows you to trace third-party code as well as yours.

 

Also, you can inject faults and other similar conditions in order to verify how your code responds to them. Finally, you can write Byteman scripts that will fail a test case if the application’s internal state diverges from the expected norm.

 

On the Windows ecosystem, similar functionality is provided by the Windows Performance Toolkit, which is distributed as part of the Windows Assessment and Deployment Kit.

 

The system has a recording component, the Windows Performance Recorder, which you run on the system facing performance problems to trace events you consider important, and the Windows Performance Analyzer, which, in true Windows fashion, offers you a nifty GUI to graph results and operate on tables.

 

Things to Remember

  1. System and library call tracing allows you to monitor the behavior of programs without access to their source code.
  2. Learn how to use the Windows Performance Toolkit (Windows), System-Tap (Linux), or DTrace (OS X, Solaris, FreeBSD).

 

Use Dynamic Program Analysis Tools

A number of specialized tools can instrument your compiled program with check routines, monitor its execution, and report detected cases of probable errors. This type of checking is termed dynamic analysis because it is carried out at runtime.

 

The corresponding checks complement the techniques discussed in 51: “Use Static Program Analysis,” such as writing "use strict"; in JavaScript and use strict; use warnings; in Perl code, which enable both static and dynamic checks.

 

Compared to static analysis tools, dynamic tools have an easier job in detecting errors that actually occur because, rather than having to deduce what code would be executed (as is the case with static analysis tools), they can trace the code as it is being executed. This means that when a dynamic analysis tool indicates an error, it is highly unlikely that this is a false positive.

 

On the other hand, a dynamic analysis tool will only look at the code that’s actually being executed. Therefore, it can miss faults that are located in code paths that aren’t exercised, resulting in a potentially large number of false negatives.

 

Because dynamic program analysis tools often dramatically slow down a program’s execution and can report a slew of low-priority errors, when debugging it’s best to employ such a tool with a very specific test script that demonstrates the exact problem you’re debugging.

 

Alternately, as a code hygiene maintenance method, you can run the code being analyzed with a realistic and complete test scenario. Through this process, you can whitelist all reported errors so that you can easily catch any new ones that appear when you introduce changes.

 

Many dynamic analysis tools offer facilities to detect the use of uninitialized values, memory leaks, and accesses beyond the boundaries of available memory space.

 

Other tools can catch security vulnerabilities, suboptimal code, incomplete code coverage (this indicates gaps in your testing), implicit type conversions, dynamic typing inconsistencies, and numeric overflows.

 

You can also read how you can use dynamic analysis tools to catch concurrency errors in 62: “Uncover Deadlocks and Race Conditions with Specialized Tools.” Wikipedia’s page on dynamic program analysis lists tens of tools; choose those that match your environment, problem, and budget.

 

A widely used open-source code dynamic analysis system is the Val-grind tool suite, which contains a powerful memory checking component. Consider the following program, which crams into three lines of code a memory leak, illegal memory access, and the return of uninitialized value.

 

Another interesting tool is the Jalangi dynamic analysis framework for a client- and server-side JavaScript. This transforms your JavaScript code into a form that exposes the code’s execution through an API.

 

You can then write verification scripts that get triggered when specific things happen, such as the evaluation of a binary arithmetic operation. You can use such scripts to pinpoint various problems in JavaScript code. 

 

Use Static Program Analysis Tools

Having tools do the debugging for you sounds too good to be true, but it’s actually a realistic possibility.

 

A variety of so-called static analysis tools can scan your code without running it (that’s where the word “static” comes from) and identify apparent bugs. Some of these tools may already be part of your infrastructure because modern compilers and interpreters often perform basic types of static analysis.

 

Stand-alone tools include GrammaTech CodeSonar, Coverity Code Advisor, FindBugs, Poly-space Bug Finder, and various programs whose name ends in “lint.” The analysis tools base their operation on formal methods (algorithms based on lots of math) and on heuristics (an important-sounding word for informed guesses).

 

Although ideally, static analysis tools should be applied continuously during the software’s development to ensure the code’s hygiene, they can also be useful when debugging to locate easy-to-miss bugs, such as concurrency gotchas and sources of memory corruption.

 

Some of the static analysis tools can detect hundreds of different bugs. The following bullets list important bugs that these tools can find and a few toy examples that demonstrate them. In practice, the code associated with an error is typically convoluted through other statements or distributed among many routines.

 

No tool will ever be perfect due to both practical limitations (memory required to track the exponential explosion of the state space) and theoretical constraints.

 

Therefore, although static analysis is useful, you’ll sometimes need to use your judgment regarding the correctness of the results it provides you, and be on the lookout for cases it misses.

 

Your first port of call to obtain the benefits of static analysis should be the compiler or interpreter you’re using. Some provide options that will make them check your code more strictly and issue a warning when they encounter questionable code. For example,

 

A starting point of options for GCC, (the Glasgow Haskell Compiler), and clang (the C language family front end of the LLVM compiler) is -Wall, -Wextra, and -Wshadow (many more are available).

 

In Perl you write use strict, and use warnings;

(The Perl and JavaScript options enable both static and dynamic checks.) In compilers, also specify a high optimization level: this performs the type of analysis needed for generating some of the warnings.

 

If the level of warnings can be adjusted, choose the highest level that will not drown you in warnings about things you’re unlikely to fix. Then methodically remove all other warnings. This may fix the fault you’re looking for, and may also make it easier to see other faults in the future.

 

Having achieved zero number of warnings, take the opportunity to adjust the compilation options so that they will treat warnings as errors (/WX for Microsoft’s compilers, -Werror for GCC). This will prevent warnings being missed in a lengthy compilation’s output, and will also compel all developers to write warning-free code.

 

Having securely anchored the benefits of configuring your compiler, it’s time to set up additional analysis tools. These can detect more bugs at the expense of longer processing times and, often, more false positives.

 

Wikipedia’s static analysis tools list contains more than 100 entries. The list includes both popular commercial offerings, such as Coverity Code Advisor, and widely used open-source software, such as Find Bugs.

 

Some focus on particular types of bugs, such as security vulnerabilities or concurrency problems. Choose those that better target your particular needs.

 

Feel free to adopt more than one tool because different tools often complement each other in the bugs they detect. Invest effort to configure each tool in a way that minimizes the spurious warnings it issues by turning off the warnings that don’t apply to your coding style.

 

Finally, make it possible to run the static analysis step as part of the system’s build, and make it a part of the continuous integration setup.

 

The build configuration will make it easy for developers to check their code with the static analysis tools in a uniform way. The check during continuous integration will immediately report any problems that slip past developers.

 

This setup will ensure that the code is always clean from errors reported by static analysis tools. All too often, a team will embark on a heroic effort to clean static analysis errors (perhaps while chasing an insidious bug), and then lose interest and let new errors creep in.

 

Things to Remember

Specialized static program analysis tools can identify more potential bugs in code than compiler warnings. Configure your compiler to analyze your program for bugs. Include in your build cycle and continuous integration cycle at least one static program analysis tool.

Recommend