20+ Debugging Tips (2019)

Debugging Tips

Debugging Tips (2019)

The fast and efficient reproduction of a problem will improve your debugging productivity. This blog explains 20+ Debugging Tips with examples to demonstrate how to use debugging in an efficient manner. 

 

Review and Manually Execute Suspect Code

On your first pass through the code, carefully examine each line, and look for common mistakes. Nowadays, you can avoid many such mistakes through appropriate conventions or a static analysis tool can point them out to you Nevertheless, mistakes can slip through, especially in code that does not (ahem) adhere to the necessary coding conventions.

 

Look for errors in operator precedence, missing braces and break statements, extra semicolons, the use of an assignment instead of a comparison, uninitialized or wrongly initialized variables, statements that are missing from a loop, off-by-one errors, erroneous type conversions, missing methods, spelling errors, and language-specific gotchas.

 

To execute code by hand, have an empty sheet of paper at your side, write down the names of the key variables, and start executing the statements in the order the computer would.

 

Every time a variable changes its value, cross out the old value and write the new one. Writing the values with a pencil makes it easier to fix any errors you make.

 

A (real) calculator may help you derive the values of complex expressions more quickly. A programmer’s calculator can be helpful if you’re dealing with bit operations.

 

Avoid using your computer: manipulating the variable values in a spreadsheet, browsing the code with an editor, or quickly checking whether any new email has arrived will make it difficult to deeply concentrate, which is what this method is all about.

 

If the code manipulates complex data structures, draw them with lines, boxes, circles, and arrows. Devise a notation to draw the algorithm’s most important parts.

 

For example, if you’re drawing intervals, you can draw the closed end with a line ending in a square bracket and the open end (the interval’s end, if you’re following the correct convention) with a round bracket.

 

You may also often find it useful to draw the parts of a program’s call graph that interest you. If you’re fluent in UML, use that for your diagrams, but don’t sweat too much about getting the notation right; seek a balance between the ease of drawing and the comprehension of what you’ve drawn.

 

A larger paper may help provide you with the space you need for your diagram. A whiteboard provides an even larger surface and also makes it easier to erase parts and collaborate.

 

Add colors to the picture to easily distinguish the elements you draw. If the diagram you’ve drawn is important, take a picture when you’re finished and attach it at the corresponding issue.

 

One notch fancier is the manipulation of physical objects, such as white-board magnets, paper clips, toothpicks, sticky notes, checker pieces, or Lego blocks. This increases the level of your engagement with the problem by bringing into play more senses: 3D vision, touch, and proprioception.

 

You can use this method to simulate queues, groupings, protocols, ratings, priorities, and a lot more. Just don’t get carried away playing with the objects that are supposed to help you with your work.

 

Things to Remember

  1. Look through the code for common mistakes.
  2. Execute the code by hand to verify its correctness.
  3. Untangle complex data structures by drawing them.
  4. Address complexity with large sheets of paper, a whiteboard, and color.
  5. Deepen your engagement with a problem by manipulating physical objects.

 

Go Over Your Code and Reasoning with a Colleague

The rubber duck technique is probably the most effective one you’ll find in this book, measured by the number of times you can apply it. It involves explaining how your code works to someone else. Typically, half-way through your explanation, you’ll exclaim, “Oh wait, how silly of me, that’s the problem!” and be done.

 

When this happens, rest assured that this was not a silly mistake that you carelessly overlooked. By explaining the code to your colleague, you engaged different parts of your brain, and these pinpointed the problem. In most cases, your colleague plays a minimal role.

 

This is how the technique gets its name: explaining the problem to a rubber duck could have been equally effective. (In the entry on rubber duck debugging, Wikipedia actually has a picture of a rubber duck sitting at a keyboard.)

 

You can also engage your colleagues in a more meaningful way by asking them to review your code. This is a more formal undertaking in which your colleague carefully goes through the code, pinpointing all problems in it: from code style and commenting, to API use, to design and logical errors.

 

So highly regarded is this technique that some organizations have a code review as a prerequisite for integrating code into a production branch. Tools for sharing comments, such as Gerrit and GitHub’s code commenting functionality can be really helpful because they allow you to respond to comments and leave a record to see how each one is addressed.

 

Etiquette plays an important role in this activity. Don’t take the comments (even the harsh ones) personally, but see them as an opportunity to improve your code. Try to address all review comments; even if a comment is wrong, it is a sign that your code is not clear enough.

 

Also, if you ask others to review your code, you should also offer to review theirs and do that promptly, professionally, and politely. Code stuck waiting for a code review, trivial comments that overlook bigger problems, and nastiness diminish the benefits gained by the practice of code reviews.

 

Finally, you can address tough problems in multi-party algorithms through role-playing. For example, if you’re debugging a communications protocol, you can take the role of one party, a colleague can take the role of the other, and you can then take turns attempting to break the protocol.

 

Other areas where this can be effective are security (you get to play Bob and Alice), human-computer interaction, and workflows. Passing around physical objects such as an “edit” token can help you here. Wearing elaborate costumes would be overdoing it though.

 

Things to Remember

  1. Explain your code to a rubber duck.
  2. Engage in the practice of code review.
  3. Debug multi-party problems through role-playing.

 

Add Debugging Functionality

By telling your program that it is being debugged, you can turn the tables and have the program actively help you to debug it.

 

What’s needed for this to work is a mechanism to turn on this debugging mode and the code implementing this mode. You can program your software to enter a debugging mode through one of the following:

  1. A compilation option, such as defining a DEBUG constant in C/C++ code
  2. A command-line option such as the -d switch used in the Unix SSH daemon and many other programs
  3. A signal sent to a process, as was the case in old versions of the BIND domain-name server

 

To avoid accidentally shipping or configuring your software with debugging enabled in a production environment, it’s a good practice to include a prominent notice to the effect that the debugging mode is available or enabled. Once you have your program enter a debugging mode there are several things you can program it to do.

 

First, you can make it log its actions so that you can get notified when something happens, and you can later examine the sequence of various events.

 

For interactive and graphics programs, it may also be helpful to have the debugging mode display more information on the screen or enhance the information already there.

 

For example, Minecraft has a debug mode in which it overlays the game screen with performance figures (frames per second, memory used, CPU load), player data (coordinates, direction, light), and operating environment specifications (JVM, display technology, CPU type).

 

It also features a debug mode world, which displays—laid out flat—the thousands of materials in all their conditions that exist in the game.

 

In a rendering application, you could display the edges that make up each object’s facets or the points controlling a Bezier curve. In web applications, it can be helpful to have additional data, such as a product’s database ID, appear when you hover the mouse over the corresponding screen element.

 

The debugging mode can also enable additional commands. These may be accessible through a command-line interface, an additional menu, or a URL.

 

You can implement commands to display and modify complex data structures (debuggers have difficulty processing these), dump data into a file for further processing, change the state into one that will help your troubleshooting, or perform other tasks described in this.

 

A very helpful debugging mode feature is the ability to enter a specific state. As an example, consider the task of debugging the seventh step of a wizard-like interface. Your job will be a lot easier if the debugging mode provides you with a shortcut to skip the preceding six steps, perhaps using some sensible default values for them.

 

Similar features are also useful for debugging games, where they can advance you to a high level (depriving you the pleasure and the excuse of playing to get there) or give you hard-to-earn additional powers.

 

A debugging mode can also increase a program’s transparency or simplify a program’s runtime behavior to make it easier to pin down failures for debugging.

 

For instance, when invoked with debug mode enabled, programs that operate silently in the background (daemons in Unix parlance, services in the Windows world) may operate in the foreground displaying output on the screen. 

 

If a program fires up many threads, you can make it run with just a single thread, simplifying the debugging of problems that are not associated with concurrency.

 

Other changes you can implement can substitute simple or naive algorithms for more sophisticated ones, eliminate peripheral operations to boost performance, use synchronous instead of asynchronous APIs, or use an embedded lightweight application server or database instead of an external one.

 

For software lacking a user interface, such as that running on some embedded devices or servers, a debugging mode can also expose additional interfaces.

 

Adding a command-line interface can allow you to enter debugging commands, and see their results. In embedded devices, you can have that interface run over a serial connection that is set up only in the debug mode.

 

Some digital TVs can use their USB interface in this way. In applications working in a networked environment, you can include a small embedded HTTP server, such as the libmicrohttpd one, which can display key details of the application and also can offer the execution of debugging commands.

 

The debugging mode can also help you simulate external failures. These are typically rare events that may require tricky instrumentation to simulate in order to troubleshoot. A debugging mode can offer commands that, by changing the program’s state, can simulate its behavior under such conditions.

 

Thus, debugging mode commands can simulate the random dropping of network packets, the failure to write data to disk, radio signal degradation, a malfunctioning real-time clock, a misconfigured smart card reader, and so on.

 

Finally, the debugging mode provides you with a mechanism to exercise rare code paths. This works by changing the program’s configuration to favor their execution, instead of a more optimal path.

 

For example, if you have a user input memory buffer that starts with a 1 kB allocation and doubles in size every time it fills, you can have your program’s debug mode initialize the buffer with space for a single byte.

 

This guarantees you that the reallocation will be frequently exercised and that you will be able to observe and fix bugs in its logic. Other cases involve configuring tiny hash table sizes (to stress-test the overflow logic) and very small cache buffers.

 

Things to Remember

  1. Add to your program an option to enter a debug mode.
  2. Add commands to manipulate the program’s state, log its operation, reduce its runtime complexity, shortcut through user interface navigation, and display complex data structures.
  3. Add command-line, web, and serial interfaces to debug embedded devices and servers.
  4. Use debug mode commands to simulate external failures.

 

Add Logging Statements

Logging statements allow you to follow and comprehend the program’s execution. They typically send a message to an output device or store it in a place that you can later browse and analyze (a file or a database). You can then examine the log to find the root cause of the problem you’re investigating.

 

Some believe that logging statements are only employed by those who don’t know how to use a debugger. There may be cases where this is true, but it turns out that logging statements offer a number of advantages over a debugger session, and therefore the two approaches are complementary.

 

First of all, you can easily place a logging statement in a strategic location and tailor it to output exactly the data you require. In contrast, a debugger, as a general-purpose tool, requires you to follow the program’s control flow and manually unravel complex data structures.

 

Furthermore, the work you invest in a debugging session only has ephemeral benefits. Even if you save your setup for printing a complex data structure in a debugger script file, it would still not be visible or easily accessible to other people maintaining the code. I have yet to encounter a project that distributes debugger scripts together with its source code.

 

On the other hand, because logging statements are permanent, you can invest more effort than you could justify in a fleeting debugging session to format their output in a way that will increase your understanding of the program’s operation and, therefore, your debugging productivity.

 

Finally, the output of proper logging statements (those using a logging framework rather than random println statements) is inherently filter-able and queryable.

 

There are several logging libraries for most languages and frameworks. Find and use one that matches your requirements, rather than reinventing the wheel. Things you can log include the entry and exit to key routines, contents of key data structures, state changes, and responses to user interactions.

 

To avoid the performance hit of extensive logging, you don’t want to have it enabled in a normal production setting. Most of the logging interfaces allow you to tailor the importance of messages recorded either at the source or at the destination.

 

Obviously, controlling the recorded messages at the source can minimize the performance impact your program will incur; in some cases down to zero.

 

Implementing in your application a debug mode allows you to increase the logging verbosity only when needed. You can also configure several levels or areas of logging to fine-tune what you want to see. Many logging frameworks provide their own configuration facility, freeing you from the effort to create one for your application.

 

Logging facilities you may want to use include the Unix syslog library, Apple’s more advanced system log facility ASL, the Windows ReportEvent API, Java’s java.util.logging package, and Python’s logging module. The interfaces to some of these facilities are not trivial, so refer to the listings as a cheat sheet for using each one in your code.

 

There are also third-party logging facilities that can be useful if you’re looking for more features or if you’re working on a platform that lacks a standard one. These include Apache’s Log4j for Java and Boost.Log v2 for C++.

 

Listing Logging with the Unix syslog interface

#include <syslog.h>
int
main()
{
openlog("myapp", 0, LOG_USER);
syslog(LOG_DEBUG, "Called main() in %s", __FILE__); closelog();
}

 

Listing Logging with Apple’s system log facility

#include <asl.h>
int
main()
{
asl_object_t client_handle = asl_open("com.example.myapp", NULL, ASL_OPT_STDERR);
asl_log(client_handle, NULL, ASL_LEVEL_DEBUG,
"Called main() in %s", __FILE__);
asl_close(client_handle);
}
Listing Logging with the Windows ReportEvent function
#include <windows.h>
int
main()
{
LPTSTR lpszStrings[] = {
"Called main() in file ",
__FILE__
};
HANDLE hEventSource = RegisterEventSource(NULL, "myservice");
if (hEventSource == NULL)
return (1);
ReportEvent(hEventSource, // handle of event source EVENTLOG_INFORMATION_TYPE, // event type
0, // event category
0, // event ID
NULL, // current user's SID
2, // strings in lpszStrings
0, // no bytes of raw data
lpszStrings, // array of error strings
NULL); // no raw data
DeregisterEventSource(hEventSource);
return (0);
}

Listing Logging with Java’s java.util.logging package

import java.io (http://java.io).IOException;
import java.util.logging.FileHandler; import java.util.logging.Level; import java.util.logging.Logger;
public class EventLog {
public static void main(String[] args) { Logger logger = Logger.getGlobal();
Include detailed messages logger.setLevel(Level.FINEST); FileHandler fileHandler = null; try {
fileHandler = new FileHandler("app.log");
} catch (IOException e) {
System.exit(1);
}
logger.addHandler(fileHandler); // Send output to file logger.fine("Called main");
}
}
Listing Logging with the Python’s logging module
import logging;
logger = logging.getLogger('myapp')
# Send log messages to myapp.log
fh = logging.FileHandler('myapp.log')
logger.addHandler(fh)
logger.setLevel(logging.DEBUG)
logger.debug('In main module')

 

In addition, many other programming frameworks offer their own logging mechanisms. For example, if you associate logging with lumberjacks, you’ll be happy to know that under node.js you can choose between the Bunyan and Winston packages.

 

If you’re using Unix shell commands, then you can log a message by invoking the logger command. In Unix kernels (including device drivers), it is customary to log messages with the printk function call.

 

If your code runs on a networked, embedded device lacking a writable file system with sufficient space where you can store the logs (e.g., a high-end TV or a low-end broadband router), consider using remote logging.

 

This technology allows you to configure the logging system of the embedded device to send the log entries to a server where these are stored. Thus, the following Unix syslogd configuration entry will send all logging associated with local1 to the log master host:

local1.* @@http://logmaster.example.com (http://logmaster.example.com):514

 

Finally, if the environment you’re programming it doesn’t offer a logging facility, you’ll have to roll your own. At its simplest form, this can be a print statement.

printf("Entering function foo\n");

 


When you (think) you’re done with a print-type logging statement, resist the temptation to delete it or put it in a comment. If you delete it, you lose the work you put to create it. If you comment it out, it will no longer be maintained, and as the code changes it will decay and become useless. Instead, place the print command in a conditional statement.

 

if (loggingEnabled)

printf("Entering function foo\n");

Apart from a print statement, here are some other ways you can have applications log their actions.

 

  1. In a GUI application, fire up a popup message.
  2. In JavaScript code, write to the console and view the results in your browser’s console window.
  3. In a web application, stuff logging output in the resulting page’s HTML—as HTML comments or as visible text.

If you can’t modify an application’s source code, you can try making it open a file whose name is the message you want to log and trace the application’s system calls with strace to see the file’s name.

 

Things to Remember

  1. Add logging statements to set up a permanent, maintained debugging infrastructure.
  2. Use a logging framework instead of reinventing the wheel.
  3. Configure the topic and details of what you log through the logging framework.

 

Use Unit Tests

If a flaw in the software you’re debugging doesn’t show up in its unit testing, then appropriate tests are lacking or completely absent. To isolate or pinpoint such a flaw, consider adding unit tests that can expose it.

 

Start with the basics. If the software isn’t using a unit testing framework or isn’t written in a language that directly supports unit testing, download a unit testing package matching your requirements, and configure your software to use it.

 

With no existing tests in place, this should involve the adjustment of the build configuration to include the testing library and the addition of a few lines in the application’s startup code to run any tests. While you’re at it, configure your infrastructure to run the tests automatically when the code is compiled and committed.

 

This will ensure that your project will benefit from the improved documentation, collective ownership, ease of refactoring, and simplified integration facilitated by the unit testing infrastructure you’re adding.

 

Then, identify the routines that may be related to the failure you’re seeing, and write unit tests that will verify their functioning. You can find the routines to test through top-down or bottom-up reasoning.

 

Try writing the tests without looking at the routines’ implementation, focusing instead on the documentation of their interface, or, if that’s lacking on the code that calls them.

 

This will lessen the probability of you replicating a faulty assumption in the unit test. Ensure that the tests you added become a permanent part of the code by committing them to the software’s revision control repository.

 

As an example, consider the class in Listing, which tracks the column position of processed text, taking into account the standard behavior of the tab character. This is notoriously difficult to get right: in the 1980s, screen output libraries contained workarounds for display terminals with buggy behavior in this area.

 

Listing A C++ class that tracks the text’s column position class ColumnTracker {

private:
int column;
static const int tab_length = 8;
public:
ColumnTracker() : column(0) {}
int position() const { return column; }
void process(int c) {
switch (c) {
case '\n':
column = 0;
break;
case '\t':
column = (column / tab_length + 1) * tab_length;
break;
default:
column++;
break;
}
}
};

 

Listing Code running the CppUnit test suite text interface

#include <cppunit/ui/text/TestRunner.h> #include "ColumnTrackerTest.h"
int
main(int argc, char *argv[])
{
CppUnit::TextUi::TestRunner runner;
runner.addTest(ColumnTrackerTest::suite()); runner.run();
return 0;
}

 

Listing Unit test code

#include <cppunit/extensions/HelperMacros.h> #include "ColumnTracker.h"
class ColumnTrackerTest : public CppUnit::TestFixture { CPPUNIT_TEST_SUITE(ColumnTrackerTest); CPPUNIT_TEST(testCtor);
CPPUNIT_TEST(testTab);
CPPUNIT_TEST(testAfterNewline);
CPPUNIT_TEST_SUITE_END();
public:
void testCtor() {
ColumnTracker ct;
CPPUNIT_ASSERT(ct.position() == 0);
}
void testTab() {
ColumnTracker ct;
Test plain characters ct.process('x'); CPPUNIT_ASSERT(ct.position() == 1); ct.process('x'); CPPUNIT_ASSERT(ct.position() == 2);
Test tab
ct.process('\t');
CPPUNIT_ASSERT(ct.position() == 8);
Test character after tab ct.process('x'); CPPUNIT_ASSERT(ct.position() == 9);
// Edge case
while (ct.position() != 15)
ct.process('x');
ct.process('\t');
CPPUNIT_ASSERT(ct.position() == 16);
Edge case ct.process('\t'); CPPUNIT_ASSERT(ct.position() == 24);
}
void testAfterNewline() {
ColumnTracker ct;
ct.process('x');
ct.process('\n');
CPPUNIT_ASSERT(ct.position() == 0);
}
};

 

Running the unit tests should expose the flawed routine. If the tests succeed, you’ll need to expand their coverage or (less frequently) verify their correctness.

 

If more than one test fails, focus on the failing routines that lie at the bottom of the dependency tree—those that call the fewest other routines (clients). Once you’ve fixed the flawed routine, run the tests again to ensure they all now pass.

 

Bolting unit tests on existing code aren’t trivial because tests and code are typically developed in tandem, so that code can be written in a testable form. Often the tests are even written before the corresponding code.

 

To unit test the suspect routines you may need to refactor the code: splitting large elements into smaller parts and minimizing dependencies between routines to simplify their invocation from the tests.

 

The techniques for doing this are beyond the scope of this book. An excellent treatment of the topic is Michael Feathers’ book Working Effectively with Legacy Code.

 

Things to Remember

  1. Pinpoint flaws by probing suspect routines with unit tests.
  2. Increase your effectiveness by adopting a unit testing framework, refactoring the code to accommodate the tests, and automating the tests’ execution.

 

Verify Your Reasoning by Perturbing the Debugged Program

Arbitrarily changing a program to see what will happen is disparagingly described as hacking. However, experimental changes that you make in a thoughtful manner can allow you to test hypotheses and learn more about the system you’re trying to debug as well as its underlying platform.

 

The changes are especially valuable if the quality of what you’re facing isn’t top notch: they may allow you to cover holes in the code’s documentation or that of the API.

 

Here are some examples of questions that might arise when you’re debugging a system, which you can easily answer by modifying some code.

  1. Can I indeed pass null as an argument to this routine?
  2. Will this code work correctly if the variable contains more than 999 milliseconds?
  3. Will, a warning get logged if a lock is held when entering this routine?
  4. Is the order of calling these methods related to my problem?
  5. Could an alternative API work better than the currently used one?
  6. You typically verify the effects of your changes by observing the program's behavior, by logging, or by running the code under a debugger.

 

One experimental approach involves modifying expressions and values that are embedded in the code, often replacing a runtime expression with concrete value. For instance, you can pass a correct value constant to a routine (or have a routine return such a value) to verify that the failure you’re trying to fix goes away.

 

Or, you can pass or return an incorrect value to see whether a problem you’re trying to isolate can be attributed to such a value. Alternately, you can set a parameter to an extreme value in order to make a tiny or rare problem, such as performance degradation, easier to observe.

 

Another experimental avenue involves code changes that allow you to test the correctness of alternative implementations. Here you replace code that might be incorrect with conceivably better code and see whether this fixes your problem.

 

For example, the Microsoft Windows API provides more than five ways to obtain a string’s width on the screen, with little guidance regarding which function is preferable. If your problem is misaligned text, you could exchange one API call (GetTextExtent Point32) with another (GetTextExtentExPoint) and observe the result.

 

Or, if you have doubts regarding the correct order for calling some routines, you can try an alternative one. In other cases, you can try extreme code simplifications.

 

Things to Remember

  1. Set values in the code by hand to identify correct and incorrect ones.
  2. If you can’t find guidance to correct the code, experiment by trying alternate implementations.

 

Minimize the Differences between a Working Example and the Failing Code

There are cases where you will have at hand the faulty code you’re debugging and an example of related functionality that works just fine.

 

This can often occur when you’re debugging a complex API invocation or an algorithm. You may get the working example from the API’s documentation, a Q&A site, open-source software, or a text-book.

 

The differences between the working example and your code can guide you to the fault. The approach I describe here is based on manipulating the source code; however, you can also look at differences in the runtime behavior of the two.

 

Before using the example code to fix the problem you’re facing, you first must compile and test it to verify that it actually works. If it doesn’t, then probably the problem doesn’t lie with your code.

 

It could be that your setup (compiler, runtime environment, operating system) is responsible for the failure, or that your understanding of what the API or algorithm is supposed to be doing is incorrect, or less likely, that you have discovered a bug in the third-party code.

 

With a handy verified working example, there are two approaches for fixing your code. Both involve gradually minimizing the differences between the example and the faulty code. By definition, when there are no differences between the working example and your code, your code will be working.

 

The first approach involves building on the example to arrive at your code. This works best when your code is simple and self-contained.

 

In small steps, add to the example elements from your code. At each step, verify the example’s functioning. The addition that causes the code you’re building to stop working is the failure’s culprit.

 

The second approach amounts to trimming your code until it matches the example. This works best when your code has many dependencies that hinder its isolated operation.

 

Here you remove or adjust material in your code to make it match the example. Do this in small steps and, after each change, check that your code keeps failing. The change you perform that makes your code work will point you to the fix you need to make.

 

Things to Remember

To find the element that causes a failure, gradually trim down your failing code to match a working example or make a working example match your failing code.

 

Fail Fast

The fast and efficient reproduction of a problem will improve your debugging productivity. Therefore, configure the software to fail at the first sign of trouble. Such a failure will make it easier for you to pinpoint the corresponding fault because the failing code will be executed relatively soon after the code that caused the failure, and may even be located close to it.

 

In contrast, allowing the software to continue running after a minor failure can lead the code’s operation into uncharted territory where a cascade of other problems will make the location of a bug much more difficult.

 

Failing quickly entails the risk of focusing on the wrong problem. However, if you fix that problem and restart your debugging, you have eliminated forever a source of doubt. Through a process of gradual elimination, you’re making progress. Again, allowing minor problems to linger can bring about death from a thousand cuts.

 

Here are some ways to speed up your program’s failures.

Add and enable assertions to verify the validity of routines’ input arguments and the success of API calls. In Java, you enable assertions at runtime with the -ea option. In C and C++ you typically enable assertions at compile time by not defining the NDEBUG macro identifier. 

  1. Configure libraries for strict checking of their use.
  2. Check the program’s operation with dynamic program analysis methods.
  3. Set the Unix shell’s -e option to make shell scripts terminate when a command exits with an error (a non-zero exit status).

 

Note that while failing fast is an effective way to debug a self-contained program, it may not be a suitable way to run a large production system that has graduated from development to maintenance.

 

There, the priority is likely to be resilience: in many cases, allowing the system to operate after a minor failure (for example, a problem loading an icon image or a crash of one among many server processes) may be preferable to bring the whole system down. This permissive mode of operation can be counterbalanced by other measures, such as extensive monitoring logging.

 

Things to Remember

When debugging, set up trip wires so that your program will fail at the first sign of trouble.

 

Consider Rewriting the Suspect Code in Another Language

When the code you’re trying to fix refuses to comply, drastic measures may be in order. One such measure is to rewrite the offending code in another language. By choosing a better programming environment, you hope to either side-step the bug completely or find and fix it by using

 

The programming language you’ll employ should be more expressive than the one currently used.

 

 For example, assessing the performance of sophisticated trading strategies can be more easily expressed in a language with functional programming support, such as R, F#, Haskell, Scala, or ML. You can also gain more expressiveness through a language's libraries.

 

In some cases, such as in the use of R for statistical computing, the gains can be so big so as make it criminal to use a less featureful alternative.

 

As another example, if your code is doing tricky string processing over dynamically allocated collections of elements in C, you may want to try rewriting the code in C++ or Python. Writing the faulty code in a more expressive language will result in a more compact implementation, which offers fewer chances for errors.

 

Another trait you might find useful is the ability to easily observe the code’s behavior, perhaps constructing the code incrementally. Here, scripting languages offer a particular advantage through the read-eval-print loop (REPL) they support.

 

If you implement an algorithm using Unix tool pipelines, then you can build the processing pipeline step by step, verifying the output of each stage before adding a next one. 

 

Furthermore, if the development system originally used doesn’t offer a decent debugging, logging, or unit testing framework, then adopting an improved implementation environment can provide you with the opportunity to pinpoint the problem by using its shiny support facilities. This can be very useful when you’re debugging code in small embedded systems with lackluster development tools.

 

Once you get the newly written code working, you have two options for fixing the original problem. The first involves adopting the new code and trashing the old one. You can easily do this when there are good bindings between the original language and the new one.

 

For instance, it’s typically trivial to call C++ from C with a few plain-data parameters and a simple return value. You can also keep your new implementation by invoking it as a separate processor microservice. However, this makes sense only when you don’t particularly care about the invocation’s cost.

 

The other option for fixing your bug involves using the new code as an oracle to correct the old one.

 

You can do this by observing the differences in the behavior between the working code and the failing one or by gradually converging the two code bases until the bug surfaces In the first case, you compare the variable and routine return values between the two implementations; in the second one, you proceed by trial and error until you arrive at a correct implementation.

 

Things to Remember

  1. Rewrite code you can’t fix in a more expressive language to minimize the number of potentially faulty statements.
  2. Port buggy code to a better programming environment to enhance your debugging arsenal.
  3. Once you have an alternative working implementation, adopt it, or use it as an oracle to fix the original one.

 

 Improve the Suspect Code’s Readability and Structure

The disorderly, badly written code can be a fertile breeding ground for bugs. Cleaning up the code can uncover the bugs, allowing you to fix them. However, before embarking on a code-cleaning trip, ensure that you have the time and authority to complete it.

 

Nobody will take kindly to a bug fix that modifies 4,000 lines. At the very least, separate cosmetic changes, from refactorings, from the actual bug fix into distinct commits. In some environments, coordinating with others for the first two changes may be the way to go.

 

Start with spacing. At the lowest level, ensure that the code follows consistently the languages and local style rules regarding spaces around operators and reserved words.

 

This can help your eye catch subtle errors in statements and expressions. At a slightly higher level, look at the indentation. Again, this should always use the same number of spaces (typically 2, 4, or 8) applied in a consistent way.

 

With orderly indentation, it’s easier to follow the code’s control flow. Be especially careful with single statements spanning multiple lines: their appropriate indentation will help you verify the correctness of complex expressions and function calls. At the highest level, use your judgment to add spaces where these can aid the user in understanding the code.

 

Aligning similar expressions with some extra spacing can make discrepancies stand out. Separating logic code blocks with an empty line can make it easier for you to understand the code’s structure.

 

In general, ensure that the visual appearance of the code mirrors its functionality so that your eye can catch suspect patterns. If the code’s formatting is really beyond manual salvation, consider using your IDE or a tool, such as clang-format or indent, to fix it automatically.

 

The code’s formatting can improve its visual appearance, but it can only go so far. Therefore, after style fixes, consider whether there’s a need to refactor the code: maintain its functionality while improving its structure.

 

Your objective here is either to fix the problem through the use of a more orderly structure—akin to rewriting or to make the fault stand out in the more orderly code.

 

Here are some common problems (code smells) that can hide faults and the refactorings you can implement to solve them. Most are derived from Martin Fowler’s classic book, Refactoring: Improving the Design of Existing Code (Addison-Wesley, 2000), which you can consult for more details.

 

Duplicated code can introduce bugs when code improvements and fixes fail to update all related instances of the code.

 

By putting the duplicated code into a commonly used routine, class, or template you can ensure that the correct code is used throughout the program. If partial code updates were the source of the failure, then you will discover them as you compare the code instances you remove.

 

The duplicated code also hides in switch statements, which often change the code’s flow based on a value representing the data’s type. Missing case elements in some switch statements can easily go unnoticed when new cases are added. As a simple measure, you can add a default clause that will log an internal error when it is executed.

 

Even better, restructure the code to eliminate the switch statement. This is typically done by moving the behavior associated with each case into a method of corresponding subclasses, and replacing the switch statement with a polymorphic method call.

 

Alternately, you can express the behavior in subclasses of a state object, which is used in the place of the switch statement.

 

A related problem is the shotgun surgery code smell, where a single change affects many methods and fields. The bug you’re chasing may be a change that someone forgot to implement.

 

By moving all the fields and methods that need change into the same class, you can ensure that these are consistent with each other. The class can be an existing one, a new one, or an internal (nested) one that localizes the required changes.

 

Also exposed to the risk of inconsistent changes are data clumps: data objects that commonly appear together. Group these into a class and use objects of that class both as parameters and as return values. This change will eliminate the multiple data objects and the risk of forgetting one.

 

When a language’s primitive values, such as integers or strings, are used to express more sophisticated values, such as currencies, dates, or zip codes, errors in the manipulation of these values can go unnoticed. For example, if currency values are represented as integers, the code can easily add two different currency values.

 

Introduce classes to represent such objects, and replace the primitive values with these objects. Similarly, the use of containers (linked lists or resizable vectors) instead of primitive arrays can help you fix errors associated with array size management.

 

A further step away from primitive types involves the use of bespoke classes, rather than naked floating point types, to represent physical units (time, mass, force, energy, acceleration).

 

With proper methods to combine these (e.g., F = ma) you can catch errors arising from their improper use—assigning apples to oranges, as it were.

 

Varying interfaces, such as the set of methods supported by a class, method names, and order and type of their parameters, can cloud the code’s structure and thereby hide bugs. Homogenize method names through renaming, and parameters through their reordering.

 

Add, remove, and move methods to homogenize their classes. The classes with their new similar interfaces may reveal more refactoring opportunities, such as the extraction of a superclass.

 

Long routines can be difficult to follow and debug. Break them into smaller pieces, and decompose complex conditionals into routine calls. If a method is difficult to break up due to many temporary variables, consider changing it into a method object where these variables become the object’s fields.

 

Code parts that are inappropriately intimate with each other can hide incorrect interactions that destroy invariants and disrupt the program’s state. Break these by moving methods and fields, and ensure associations between classes are unidirectional rather than bidirectional.

 

Long chains of delegation can provide clients with more access than what is strictly needed, which can, in turn, be a source of errors. Break these by introducing delegate methods. Thus the expression

account.getOwner().getName()

with the introduction of the getOwnerName delegate method becomes

account.getOwnerName()

 

Surprisingly, comments can also point to trouble spots when they’re used to veil incomprehensible or suboptimal code. Often it is enough to replace a commented block of code with a method whose name reflects the original code’s comment.

 

The resulting short sequence of method calls will make it easier for you to spot errors in the code’s logic. In other cases, an assertion will more effectively express the preconditions written in a comment because the assertion will readily fail when the precondition isn’t satisfied.

 

As a final step, remove dead code and speculative generality. Remove unused code and parameters, collapse unused class hierarchies, inline classes with a single client, and rename methods with funky abstract names to reflect what they actually do. Your aim here is to eliminate hiding places for bugs.

 

Things to Remember

  1. Format code in a consistent manner to allow your eye to catch error patterns.
  2. Refactor code to expose bugs hiding in badly written or needlessly complex code structures.

Recommend