How to Debug

How to Debug

How to Debug code?

To remove bugs and improve code performance we use several debugging tools and techniques. This blog explains how to debug your code efficiently and effectively.


Drill Up from the Problem to the Bug

There are generally two ways to locate a source of a problem. You can either start from the problem’s manifestation and work up toward its source, or you can start from the top level of your application or system and work down until you find the source of the problem.


Depending on the type of problem, one approach is usually more effective than the other. However, if you reach a dead end going one way, it may be helpful to switch directions.


Use the Software’s Debugging Facilities

Programs are complex beasts, and for this reason, they often contain built-in debugging support. This can, among other things, achieve the following:


Make the program easier to debug by disabling features such as background or multi-threaded execution

  1. Allow the precise targeting of a failing test case through its selective execution
  2. Provide reports and other intelligence regarding performance


Introduce additional logging

Therefore, invest some effort to find what debugging facilities are available in the software you’re debugging. Searching for the program’s documentation and source code for the word debug is an excellent starting point.


This can point you to command-line options, configuration settings, build options, signals (on Unix systems), registry settings (on Windows), or command-line interface commands that will enable the program’s debug mode.


Typically, setting debugging options will make the program’s operation more transparent through verbose output and, sometimes, simpler operation. Use the expanded log output to explore the reasons behind a failure you’re witnessing. Here are a few examples.


A simple case of debugging functionality involves having a command detail its actions. For example, the Unix shells offer the -x option to display the commands they execute. This is useful for debugging tricky text substitution problems.


Often, a number of options can be combined to set up an execution that’s suitable for debugging a problem. Consider troubleshooting a failed ssh connection.


Instead of modifying the global sshd configuration file or keys, which risks locking everybody out, you can invoke sshd with options that specify a custom configuration file to use (-f) and a port distinct from the default one (-p).


Adding the -d (debug) will run the process in the foreground, displaying debug messages on the terminal. These are the commands that will be run on the two hosts where the connection problem occurs.


# Command run on the server side

sudo /usr/sbin/sshd -f ./sshd_config -d -p 1234

Command run on the client side ssh -p 1234


Another type of debugging functionality allows you to target a precise case. Consider trying to understand why a specific email message out of the thousands being delivered on a busy host faces a delivery problem.


This can be examined by invoking the Postfix sendmail command with the verbose (-v) and message delivery (-M) options followed by the identifier of the failure message.

sudo sendmail -v -M 1ZkIDm-0006BH-0X


Things to Remember

Identify what debugging facilities are available in the software you’re troubleshooting, and use them to investigate the problem you’re examining.


Diversify Your Build and Execution Environment

Sometimes you can pin down subtle, elusive bugs by changing the playing field. You can do this by building the software with another compiler, or by switching the runtime interpreter, virtual machine, middleware, operating system, or CPU architecture.


This works because the other environment may perform stricter checks on the inputs you supply to some routines or because its structure amplifies your mistake.


Therefore, if you experience application instability, crashes that you can’t replicate, or portability problems, try testing your software on another setup.


Such a change can also allow you to use more advanced debugging tools, such as a nifty graphical debugger or dtrace. Compiling or running your software on another operating system can unearth incorrect assumptions regarding API usage.


As an example, some C and C++ header files often declare more entities that are strictly needed, which may lull you into forgetting to include a required header, resulting, again, in portability problems for your customers. Also, some API implementations can vary significantly between operating systems:


Solaris, FreeBSD, and GNU/Linux ship with different implementations of the C library, while the Windows API on the desktop and mobile versions is currently relying on a different code base. Note that these differences can also affect interpreted languages that use the underlying C libraries and APIs, such as JavaScript, Lua, Perl, Python, or Ruby.


On languages that run close to the hardware, such as C and C++, the underlying processor architecture can influence a program’s behavior.


Over the past decades, the dominance of Intel’s x86 processor architecture on the desktop and the ARM architecture on mobile devices has reduced the popularity of architectures with significant differences in byte ordering (SPARC, PowerPC) or even the handling of null pointer indirection (VAX).


Nevertheless, differences in the handling of misaligned memory accesses and memory layout still exist between x86 architecture and ARM.


For example, accessing a two-byte value on an odd memory address can generate a fault on some ARM CPUs and may result in non-atomic behavior. On other architectures, misaligned memory accesses may severely affect an application’s performance.


Furthermore, the size of structures and offsets of their members can differ among the two architectures, especially when using earlier compiler versions.


More importantly, the sizes of primitive elements, such as long and pointer values, change as you move code from 32-bit to 64-bit architectures or from one operating system to another. Consider the following program, which displays the sizes of five primitive elements.

#include <stdio.h>
printf("S=%zu I=%zu L=%zu LL=%zu P=%zu\n", sizeof(short), sizeof(int), sizeof(long),
sizeof(long long), sizeof(char *));

Therefore, running your software on another architecture or operating system can help you debug and detect portability problems.


On mobile platforms, there’s a huge variation not only on the version of the operating system they run (most phone and tablet manufacturers ship their own modified version of the Android operating system), but also significant hardware differences: screen resolution, interfaces, memory, and processor.


This makes it even more important to be able to debug your software on a variety of such platforms. To address this problem, mobile app development groups often maintain a stock of many different devices.


There are three main ways to debug your code in another execution environment.

You can use virtual machine software on your workstation to install and run diverse operating systems. This approach has the added advantage of providing you with an easy way to maintain a pristine image of the execution environment: just copy the configured virtual machine image to the “master” file, which you can restore when needed.


You can use small inexpensive computers. If the architecture you’re mainly targeting is x86, an easy way to get your hands on an ARM CPU is to keep at hand a Raspberry Pi.


This miniature ARM-based device runs many popular operating systems. It’s easy to plug into an Ethernet switch or connect it via Wi-Fi so that you can access it over the network.


This will also allow you to cut your teeth into the GNU/Linux development environment, which can be beneficial if you’re mainly debugging your code on Windows or OS X. Also, if Windows is your regular cup of tea, a Mac mini tucked under your desk can offer you easy access to an OS X development environment.


You can rent cloud-based hosts running the operating systems you want to use.


It’s not always necessary to use another operating system or device to debug your software on diverse compiler and runtime environments. You can easily introduce ecosystem variety on your own development work-station.


By doing so, you can regularly benefit from the additional errors and warnings and stricter conformance in some areas that another environment can offer you. As is the case with static analysis tools, different compilers can typically detect more problems than a single one can.


This includes both portability problems, which may inadvertently creep in due to lax checking by a particular compiler, and logical flaws that one compiler may not warn about.


Compilers are very good at compiling any legal code into a matching executable but are sometimes not as good at flagging misuse of the language, for example, identifying code that works only if a particular included header file also declares some undocumented elements.


The second pair of compiler eyes can help you in this regard. All you need to do is to install—and use as part of your debugging lifecycle—an alternative to your mainstream environment. Here are some suggestions.


  1. For .NET Framework development, use Mono in parallel with Microsoft's tools and environment.
  2. For the development of Ada, C, C++, Objective C, and other supported languages, use both LLVM and GCC.
  3. For Java development, use both the OpenJDK (or Oracle’s offering from the same code base) and GNU Classpath. Also, try using more than one Java runtime.
  4. For Ruby programs, apart from the reference CRuby implementation, try other VMs: JRuby, Rubinius, and ruby.


A more radical alternative involves reimplementing part of your code in another language. This can be helpful when you’re debugging a tricky algorithm. The typical case involves an initial (failing) implementation written in a relatively low-level language, such as C. Consider implementing an alternative in a more high-level language: Python, R, Ruby, Haskell, or the Unix shell.


The alternative implementation achieved by using the language’s high-level features, such as operations on sets, pipes, and filters, and higher-order functions, may help you arrive at a correctly functioning algorithm.


Through this method, you can quickly identify problems in the algorithm’s design and also fix implementation faults. Then, if performance is really critical, you can implement the algorithm in the original language or a language that’s closer to the CPU and use differential debugging techniques to make it work.


Things to Remember

  1. Diverse compilation and execution platforms can offer you valuable debugging insights.
  2. Fix a tricky algorithm by implementing it in a higher-level language.


Focus Your Work on the Most Important Problems

Most big software systems ship and operate with countless (known and unknown) bugs. Deciding in an intelligent manner on which bugs to concentrate and which bugs to ignore will increase your debugging effectiveness.


Hopefully, you’re not being paid to minimize the number of open issues, but to help deliver reliable, usable, maintainable, and efficient software.


Therefore, set priorities through an issue-tracking system and use them to concentrate your work on top-priority issues and to ignore low-priority ones. Here are a few points to help your prioritization.


Give a high priority to the following types of problems

Data loss: This can occur as a result of either data corruption or usability issues. Users entrust their data to your software. If you lose their data you violate that trust, and trust that is lost is difficult to regain.


Security: This may affect the confidentiality or integrity of the software's data, the integrity of the system where your software is running, or the availability of the service your software is providing.


Such problems are often exploited by malicious individuals and can, therefore, result in large monetary and reputational damage. Security problems can also garner unwelcomed attention from regulatory authorities or extortionists. Consequently, sweeping security issues under the carpet is not an option.


Reduced service availability: If your software is providing a service, the cost of downtime may be measured in dollars (sometimes millions of them). Lost goodwill, late-night phone calls from irate managers, and clogged support desks are additional consequences you want to avoid.


Safety: These are issues that may result in death or serious injury to people, loss or severe damage to property, or environmental harm. All consequences of the preceding problems apply here. If your software can fail in such a way, you should have more rigorous processes than this list to guide your actions.


Crash or freeze: This may result in data loss or downtime, and it may also signify an underlying security problem. Thankfully, you can often easily debug a crashed or non-responding application through postmortem debugging. Consequently, it makes little sense to give such issues a low priority.


Code hygiene: Compiler warnings, failed assertions, unhandled exceptions, memory leaks, and, in general, inferior code quality provide a fertile ground for serious bugs to develop and hide. Therefore, don’t let such issues persist and accumulate


The following are types of problems you may decide to relegate to a lower priority. These issues are not by themselves unworthy of your attention. However, they are issues you may be able to set aside in order to deal with more urgent ones.


Legacy support: Support for outdated hardware, API, and file formats is commendable, but, from a business perspective, it won’t get you very far because, by definition, you’re serving a shrinking market.


Backward compatibility: Here the case is less clear-cut because if your software evolves in a way that leaves behind past users, you’re losing customer goodwill.


Some companies, such as Nikon, have established a stellar reputation by maintaining backward compatibility through many generations of their product: you can still use a 1970s Nikkor lens on high-end modern Nikon cameras.


On the other hand, some successful software firms are known for their “take no prisoners” approach, where they ditch support for older software and services without any qualm. Sometimes it may be worth eliminating support for an old feature in order to focus on the future.


Cosmetic issues: These may be devilishly hard to get right and easy to ignore. You are unlikely to lose business over a truncated bubble-help, but dynamically adjusting the size of the ’s panel based on the screen’s dpi setting can be a nightmare.


Documented workarounds: You may be able to avoid debugging some tricky issues by documenting a workaround. After switching on my TV, the first time I try to use the TV’s remote to operate the media player I get a “Please try again” prompt. I suspect that properly fixing this minor problem may be a major project.


Rarely used features: For problems associated with an exotic, rarely used a feature of your software, it may be more productive to yank the corresponding feature (and deal with the small, if any, fallout), than to actually solve the problem. Collecting usage data regarding your software can make it easier for you to reach such decisions.


Note that you should be explicit when you decide to ignore a low-priority issue. File it in the issue-tracking system, and then close it with an action such as “won’t solve.” This documents the decision you’ve made and helps avoid the management overhead of future duplicate issues.


Things to Remember

  1. Not all problems are worth solving.
  2. Fixing a low-priority issue may deprive you of the time required to address a high-priority one.


1: Handle All Problems through an Issue-Tracking System

First, ensure you have an issue-tracking system in place. Many source code repositories, such as GitHub and GitLab, provide a basic version of such a system integrated with the rest of the functionality they provide.


A number of organizations use JIRA, a much more sophisticated proprietary system, that can be licensed to run on-premise or as a service. Others opt to use an open-source alternative, such as Bugzilla, Launchpad, OTRS, Redmine, or Trac. The system that is chosen is not as important as using it to file all issues in it.


Refuse to handle any problem that’s not recorded in the issue-tracking system. The unswerving use of such a system

  1. Provides visibility to the debugging effort
  2. Enables the tracking and planning of releases
  3. Facilitates the prioritization of work s
  4. Helps document common issues and solutions
  5. Ensures that no problems fall through the cracks
  6. Allows the automatic generation of release notes
  7. Serves as a repository for measuring defects, reflecting on them, and learning from them


For persons too senior in the organization’s hierarchy to be told to file an issue, simply offer to file it for them. The same goes for issues that you discover yourself. Some organizations don’t allow any changes to the source code unless they reference an associated issue.


Also, ensure that each issue contains a precise description of how to reproduce it. Your ideal here is a short, self-contained, correct (compilable and runnable) example (SSCCE), something that you can readily cut and paste into your application to see the problem..


To improve your chances of getting well-written bug reports, create instructions of how a good report should look, and brainwash all bug reporters to follow them religiously. (In one organization, I saw these instructions posted on the lavatory doors.)


Other things you should be looking for in a bug report are a precise title, the bug’s priority and severity, and the affected stakeholders, as well as details about the environment where it occurs.


Here are the key points concerning these fields. A precise short title allows you to identify the bug in a summary report. “Program crashes” is a horrible title; “Crash when clicking on Refresh while saving” is a fine one.


The severity field helps you prioritize the bugs. Problems, where a data loss occurs, are obviously critical, but cosmetic issues or those where a documented workaround is possible are less so. A bug’s severity allows a team to triage a list of issues, deciding which to address now, which to tackle later, and which to ignore.


The result of triaging and prioritization can be recorded as the issue's priority, which provides you the order on which to work. In many projects, the bug’s priority is set by the developer or project lead because end-users tend to set-top priority to all the bugs they submit.


Setting and recording realistic priorities fends off those managers, customer representatives, developers in other teams, and salespeople who claim that everything (or, at least, their pet issue) is a top priority.


Identifying an issue’s stakeholders helps the team get additional input regarding an issue and the product owner prioritize the issues. Some organizations even tag stakeholders with the yearly revenue they bring in. 


A description of the environment can provide you with a clue on how to reproduce an elusive bug. In the description, avoid the kitchen sink approach in which you demand everything, from the PC’s serial number and BIOS date to the version of each library installed on the system. This will overburden your users, and they may just skip those fields.


Instead, ask only the most relevant details; for a web-based app, the browser is certainly important. For a mobile app, you probably want the device maker and model. Even better, automate the submission of these details through your software.


When you work with an issue-tracking system, an important good practice is to use it to document your progress. Most tracking systems allow you to append to each entry successive free-form comments.


Use these to document the steps you take for investigating and fixing the bug, including dead ends. This brings transparency to your organization’s workings.


Write down the precise command incantations you use to log or trace the program’s behavior. These can be invaluable when you want to repeat them the next day, or when you (or some colleagues) hunt a similar bug a year later.


The notes can also help refresh your memory when blurry eyed and phased out from a week-long bug-hunting exercise, you try to explain to your team or manager what you’ve been doing all those days.


Things to Remember

  1. Handle all problems through an issue-tracking system.
  2. Ensure each issue contains a precise description of how to reproduce the problem with a short, self-contained, correct example.
  3. Triage issues and schedule your work based on the priority and severity of each issue.
  4. Document your progress through the issue-tracking system.


2: Use Focused Queries to Search the Web for Insights into Your Problem

It’s quite rare these days to work in a location lacking Internet access, but when I find myself in one of these my productivity as a developer takes a dive. When your code fails, the Internet can help you find a solution by searching the web and by collaborating with fellow developers.


A remarkably effective search technique involves pasting the error message reported by a failing third-party component into the browser’s search box enclosed in double quotes. The quotes instruct the search engine to look for pages containing the exact phrase, and this increases the quality of the results you’ll get back.


Other useful things to put into the search box include the name of the library or middleware that gives you trouble, the corresponding name of the class or method, and the returned error code.


The more obscure the name of the function you look for, the better. For example, searching for PlgBlt will give you far better results than searching for BitBlt. Also try synonyms, for instance, “freezes” in addition to “hangs,” or “grayed” in addition to “disabled.”


You can often solve tricky problems associated with the invocation of APIs by looking at how other people use them. Look for open-source software that uses a particular function, and examine how the parameters passed to it are initialized and how its return value is interpreted.


For this, using a specialized code search engine, such as the Black Duck Open Hub Code Search, can provide you with better results than a generic Google search.


For example, searching for mktime on this search engine, and filtering the results for a specific project to avoid browsing through library declarations and definitions, produces the following code snippet.

nowtime = mktime(time->tm_year+1900, time->tm_mon+1,

time->tm_mday, time->tm_hour, time->tm_min, time->tm_sec);


This shows that the mktime function, in contrast to local time, expects the year to be passed in full, rather than as an offset from 1900 and that the numbering of months starts from 1. These are things you can easily get wrong, especially if you haven’t read carefully the function’s documentation.


When looking through the search results, pay attention to the site hosting them. Through considerable investment in techniques that motivates participants, sites of the StackExchange network, such as Stack Overflow, typically host the most pertinent discussions and answers.


When looking at an answer on Stack Overflow, scan beyond the accepted one, looking for answers with more votes. In addition, read the answer’s comments because it is there that people post updates, such as newer techniques to avoid an error.


If your carefully constructed web searches don’t come up with any useful results, it may be that you’re barking up the wrong tree. For popular libraries and software, it’s quite unlikely that you’ll be the first to experience a problem.


Therefore, if you can’t find a description of your problem online, it may be the case that you’ve misdiagnosed what the problem is. Maybe, for example, the API function that you think is crashing due to a bug in its implementation is simply crashing because there’s an error in the data you’ve supplied to it.


If you can’t find the answer online, you can also post on Stack Overflow your own question regarding the problem you’re facing. This, however, requires considerable investment in creating an SSCCE. This is the golden standard regarding question asking in a forum: a short piece of code other members can copy-paste and compile on its own to witness your problem.


For some languages, you can even present your example in live form through an online IDE, such as SourceLair or JSFiddle. You can find more details on how to write good examples for specific languages and technologies at Short, Self Contained, Correct Example. Also worth reading is Eric Raymond's guide on this topic titled How To Ask Questions The Smart Way.


I’ve often found that simply the effort of putting together a well-written question and an accompanying example led me to my problem’s solution. But even if this doesn’t happen, the good example is likely to attract knowledgeable people who will experiment with it and, hopefully, provide you with a solution.


If your problem is partly associated with an open-source library or program, and if you have strong reasons to believe that there’s a bug in that code, you can also get in contact with its developers. Opening an issue on the code’s bug-tracking system is typically the way to go. 


Again here, make sure that there isn’t a similar bug report already filed, and that you include in your report precise details for reproducing the problem.


If the software doesn’t have a bug-tracking system, you can even try sending an email to its author. Be even more careful, considerate, and polite here; most open-source software developers aren’t paid to support you.


Things to Remember

  1. Perform a web search regarding error messages by enclosing them in double quotes.
  2. Value the answers from StackExchange sites.
  3. If all else fails, post your own question or open an issue.


Confirm That Preconditions and Postconditions Are Satisfied

When repairing electronic equipment, the first thing to check is the power supplied to it: what comes out of the power supply module and what is fed into the circuit. In far too many cases, this points to the failure’s culprit.


Similarly, in computing, you can pinpoint many problems by examining what must hold at the routine’s entry point (pre-conditions—program state and inputs) and at its exit (postconditions— program state and returned values).


If the preconditions are wrong, then the fault lies in the part that set them up; if the postconditions are wrong, then there’s a problem with the routine. If both are correct, then you should look somewhere else to locate your bug.


Put a breakpoint at the beginning of the routine, or the location where it’s called, or the point where a crucial algorithm starts executing.


To verify that the preconditions have been satisfied, examine carefully the algorithm’s arguments, including parameters, the object on which a method is invoked, and the global state used by the suspect code. In particular, pay attention to the following.


Look for values that are null when they shouldn’t be.

Verify that arithmetic values are within the domain of the called math function; for example, check that the value passed to log is greater than zero. Look inside the objects, structures, and arrays passed to the routine to verify that their contents match what is required; this also helps you pinpoint invalid pointers.


Check that values are within a reasonable range. Often uninitialized variables have a suspect value, such as 6.89851e-308 or 61007410. Spot-check the integrity of any data structure passed to the routine; for example, that a map contains the expected keys and values, or that you can correctly traverse a doubly linked list.


Then, put a breakpoint at the end of the routine, or after the location where it’s called, or at the point where a crucial algorithm ends its execution. Now examine the effects of the routine’s execution.


Do the computed results look reasonable? Are they within the range of expected results?

If yes, are the results actually correct? You can verify this by executing the corresponding code by hand, by comparing them with known good values, or by calculating them with another tool or method. 


Are the routine’s side effects the expected ones? Has any other data touched by the suspect code been corrupted or set to an incorrect value? This is especially important for algorithms that maintain their own housekeeping information within the data structures they traverse.


Have the resources obtained by the algorithm, such as file handles or locks, been correctly released?

You can use the same method for higher-level operations and setups. Verify the operation of an SQL statement that constructs a table by looking at the tables and views it scans and the table it builds.


Work on a file-based processing task by examining its input and output files. Debug an operation that is built on web services by looking at the input and output of each individual web service.


Troubleshoot an entire data center by examining the facilities required and provided by each element: networking, DNS, shared storage, databases, middleware, and so on. In all cases, verify, don’t assume.


Things to Remember

Carefully examine a routine’s preconditions and postconditions.