The Debugging Guide — The Debugging Guide 0.2 documentation (original) (raw)

Even the most seasoned programmer makes mistakes, so implementing software will always involve some amount of debugging: finding and fixing errors in your code. While there are many tools that can aid in debugging, using these tools is not necessarily hard; what is hard is understanding the errors you encounter, and devising a strategy to resolve those errors.

In this guide, we are going to cover the following:

We will use C and Python examples to explain some of these concepts. Readers with a C background are assumed to be familiar with how to use GCC from the command-line, as well as the compiling and linking in general. Readers with a Python background are assumed to know how to run Python code from a terminal, or from inside the Python interpreter.

The examples can be found in the examples directory of our GitHub repository: https://github.com/uchicago-cs/debugging-guide

This guide was originally written for CMSC 22000 Introduction to Software Developmentat the University of Chicago.

Types of Programming Errors

We can classify programming errors into three general categories:

Build Errors

Build errors occur when we try to build our code (either with a compiler, or on-the-fly with an interpreter), and are unable to do so because of an issue in our code (this means we can’t even run our code).

Syntax errors

A syntax error occurs when we write code that doesn’t conform to the programming language’s specification. This type of error is, arguably, the easiest to fix, because modern interpreters/compilers will often tell us exactly what is wrong in our code (down to the line where the error was detected). However, interpreting these error messages can sometimes be challenging.

In the examples/syntax-error directory, try compiling syntax-error.c:

gcc syntax-error.c -o syntax-error

It should produce the following error:

syntax-error.c: In function 'main': syntax-error.c:7:5: error: expected ';' before 'return' return 0; ^

This message seems pretty helpful: it is telling us the file, function, and line (7) where the error is detected. It’s even telling us the _character_in that line that triggered the error (the fifth character).

However, if we look at the code, we can see that the error actually originates in line 5:

The line is missing a semicolon at the end, but this issue is not detected until line 7 (when the compiler encounters a “return” statement when it expected a semicolon)

Debugging Rule of Thumb #1

When an interpreter/compiler gives you a line number, it doesn’t necessarily mean that line is incorrect (and that is has to be fixed). Always look at the offending lines and the lines before it to see if you can spot the issue.

Linker errors

This section is only applicable to C programming. If you are using Python, skip to Runtime Errors

In the examples/link-error/ directory, try doing this:

gcc main.c greet.c -o greet

It should produce the following error:

/tmp/ccyLGeRy.o: In function main': main.c:(.text+0x14): undefined reference to bye' collect2: error: ld returned 1 exit status

This is a linker error, not a compiler error. Remember what’s happening behind the scenes:

When compiling a single C file, an object file can have undefined references. This is normal, and even expected: the linker will take care of resolving those undefined references. What the error above is telling us is that the linker was unable to resolve one such reference.

If we look at main.c and greet.h (which is included in main.c). We can see that we call a function called bye, and that it is declared in greet.h. This is why we are able to compile main.c individually. However, if we look at greet.c, we can see that it doesn’t include a bye function, it includes a goodbye function.

This is why we get an error:

Linker errors can be more cryptic than compiler errors (notice how the error message doesn’t tell us the line where the undefined reference happens). We can make them slightly less cryptic by using GCC’s -g flag:

$ gcc -g main.c greet.c -o greet /tmp/ccXCYkEk.o: In function main': /home/borja/cs220/link-error/main.c:6: undefined reference to bye' collect2: error: ld returned 1 exit status

This flag tells GCC to include debugging information in the object files, which allows the linker to print a more helpful message (because it now knows the exact line where the undefined reference happens). You should get into the habit of always using the -g flag, except for producing the final version of your program (the -g flag increases the size of your object files, and is unnecessary once the software is in production).

Spurious build errors

This section is only applicable to C programming. If you are using Python, skip to Runtime Errors

Let’s look at another example. In the examples/spurious directory, try doing this:

gcc spurious.c -o spurious

This will produce a lot of compiler warnings, but the compilation will ultimately succeed. However, if we try to run the program, we will get a segmentation fault.

How do we go about solving this error? We need to follow these rules of thumb:

Debugging Rule of Thumb #2

If a compiler spits out multiple errors/warnings, start by trying to resolve only the first one. It is likely that the rest of the errors/warning are spurious errors (errors that result from a previous error) that will go away as soon as you resolve the first error.

Corollary: Don’t get overwhelmed when you get a huge number of errors. Focus on the first one, solve it, and try compiling again. If you get a different set of errors, repeat the process: focus only on the first one, solve it, etc.

Debugging Rule of Thumb #3

Treat warnings as seriously as compiler errors. While a warning will still allow your program to compile, it may be a sign of trouble further down the road.

Corollary: Do not try to debug runtime errors (like segfaults) until you have resolved all compiler warnings.

In the above example, the warning message is directly related to why we get a segfault later on:

spurious.c: In function 'main': spurious.c:9:7: warning: assignment makes integer from pointer without a cast [-Wint-conversion] s = malloc(100);

This warning can make us realize that we should’ve declared s as char* not as char. If we fix this, our program will compile and run correctly.

Note: In this simple example, we could’ve spotted the error by code inspection. However, a silly typo like this (forgetting a *) is not uncommon and, in a much larger codebase, it can be hard to track down the issue just by code inspection (unless you heed the compiler’s warnings to focus your search).

Runtime Errors

Runtime errors are errors that make a program crash while it is running.

For example, in the examples/runtime/ directory we have a simple program that takes two integers as command-line parameters and divides them:

$ gcc runtime-error.c -o runtime-error $ ./runtime-error 10 2 a / b = 5

However, if we try dividing by zero (which will also make the program crash), we get a descriptive error (a Floating point exception has happened), but no hints as to what part of our code triggered that error:

$ ./runtime-error 10 0 Floating point exception

Another common type of runtime error is a “Segmentation fault” (or “segfault”), resulting from your program trying to access regions of memory it shouldn’t access. This error is actually triggered by the operating system itself: when it detects that you’re trying to access an illegal memory address, it will simply kill your program and, at least in C, this results in a single very unhelpful error message: “Segmentation fault”.

As we’ll see later on, we will have to use a debugger (like GDB) to track down the origin of the error in C but, ultimately, it is generally possible to track down the origin of runtime errors down to the exact line that triggers them.

Logic errors

When your program compiles and runs (without crashes) but exhibits incorrect behavior, that is known as a logic error (i.e., there is an error in the “logic” of your program).

For example, let’s look at distance.c in the examples/logic/ directory. It takes four command-line parameters X1 Y1 X2 Y2 representing two points in 2-dimensional space, and computes the distance between them:

$ gcc distance.c -o distance -lm $ ./distance 0 0 1 0 The distance from (0.00, 0.00) to (1.00, 0.00) is 1.00 $ ./distance 0 0 3 4 The distance from (0.00, 0.00) to (3.00, 4.00) is 5.00

This looks like this program is working fine. However, if we try a few other values, we get incorrect distances:

$ ./distance -1 0 1 0 The distance from (-1.00, 0.00) to (1.00, 0.00) is 0.00 $ ./distance -1 -1 2 3 The distance from (-1.00, -1.00) to (2.00, 3.00) is 2.24

The correct distances are 2 and 5, respectively. Can you spot what is wrong with the code?

Logic errors are probably the hardest errors to debug, because, unlike build and runtime errors, our code actually runs without crashing or producing any noticeable error. However, the _behaviour_of the code is incorrect (and, in many cases, the program will work correctly in some cases but not in others, like the distance example above). As we’ll describe soon, we have to be specially methodic to track down these kind of errors.

Debugging Techniques

Not all errors are debugged in the same way, and we need to make sure we use the right strategy for the right error.

Most debugging is reactive: an error happens, and we try to find the cause of that error so we can fix it. Debugging is said to be like investigating a murder where you are both the detective and the murderer.

Solving build errors can generally be done by applying the rules of thumb 1-3 we have already described.

Solving runtime errors in C can usually be done by using a debugger, which can provide additional information about the error. In the case of segfaults, it can tell you the exact line where the segfault is happening. Solving runtime errors in Python usually involves interpreting the stack trace we are given to figure out the root cause of the error.

Solving logic errors is trickier (as we saw earlier, we don’t get clues on where the error may be located).

Debugging Rule of Thumb #4

When debugging an error (but specially a logic error), narrow down your search as much as possible. Start by focusing on the likely source of the error, and then refine your search using the techniques described below. Don’t waste your time looking through the parts of the code that actually work.

Below we describe a number of debugging techniques that can be used to narrow down the origin of runtime and logic errors. The first two, print debugging and using a debugger, are reactive techniques: we use them when a bug manifests itself. The remaining ones are preemptive: they involve writing code that doesn’t impact the functionality of our program, but which either helps us catch bugs sooner, or makes it easier for us to debug our program when bugs do surface.

Reactive techniques

We add print statements in our code (printf in C and print in Python) to print out the values of variables (or, more generally, to get a better sense of the state of the running program) and re-run the program. Sometimes, seeing those values will help us to immediately realize what the problem is (specially when a variable has a value we did not expect)

For example, let’s say that your program is printing an incorrect value that is computed by calling functions A, B, and C. You should print the return value of each of those functions, to see if it is correct. Let’s say that B returns an incorrect value; you then narrow down your search to function B. You could then print out the values of the local variables at certain key points, to see whether they are being updated correctly.

Debugging Rule of Thumb #5

Assume nothing, verify everything.

If you find yourself thinking “I’m pretty sure this variable has the expected value” and “No need to bother debugging this function, I’m sure it works correctly”, you should still verify that this is the case.

Using a debugger

“print debugging” is an easy technique that sometimes yields immediate results. However, it is a tedious technique if you’re dealing with an insidious bug that requires tracing through several functions, variables, etc. to find the issue. If you find yourself writing more than ten printf statements, that’s a sign you may need to switch to using a debugger.

A debugger allows you to trace the execution of your program, and to observe the evolution of its state (e.g., the values of all the variables) as the program runs. In a sense, this is not that different from print debugging, except you get to automatically “print” every single variable at every single line of code.

Modern development environments, like Eclipse, PyCharm, CLion, etc. include built-in debuggers, and there are also command-line debuggers like GDB for C code andpdb for Python.

Note

This guide currently does not include specific instructions on how to use a debugger. For tracking down C runtime errors, we recommend looking at the Debugging Homeworkfrom CMSC 22000 (the class this guide was originally written for).

Preemptive techniques

Assertions

Most modern programming languages include the ability to add assertions in our code. An assertion is a boolean condition which, if false, will make our program exit immediately, with a helpful message telling us what boolean condition failed.

For example, see the array.c file in examples/assertions/. It includes an add_arrayfunction that includes an assertion:

int add_array(int *a, int length) { assert(a != NULL);

If we were to call add_array with a NULL pointer, our code would crash. This would result in a segfault and, as we saw previously, we would likely be able to trace the exact cause of the error, but it would mean firing up a debugger, etc. With an assertion, on the other hand, the same situation will result in the program crashing with an error message like this:

array: array.c:7: add_array: Assertion `a != NULL' failed.

Assertion errors will tell us the file and line where the assertion failed. This gives us a lot more information than just a Segmentation fault message, and will make the error much easier to debug.

Of course, these are simple examples where you could’ve figured out the issue just by code inspection. However, in much larger programs, where a misbehaving function could cause a series of baffling spurious errors somewhere else in our program, these kind of logic errors can be much harder to track down. Assertions can help us detect these problems right away, instead of doing print debugging or using a debugger to track down the origin of the problem.

Logging

When debugging a logic error, we often have to trace through the execution of the program with a debugger to see whether anything looks “off” (e.g., a variable has a value we did not expect, etc.). We can facilitate this process by adding logging statements that provide information on what the program is doing.

For example, we could add the following statement at the top of the add_arrayfunction:

printf("Adding up array with %i elements.\n", length)

Note that we wouldn’t add this in reaction to an error: we would add it from the get-go, because the information included in the log message could be useful if we do have to debug the program. In particular, it would allow us to immediately verify whether the function is receiving the expected inputs (if not, that would give us a clue on where to narrow our search)

In a way, you can think of this as preemptive print debugging. It does have one big drawback, though: now our program’s output will be filled with this kind of debugging information, which we may not always want to print. When adding logging statements to a program, it is more common to use a logging library, which will allow you to control the level of logging (so that you can run your program without logging when you are not debugging it). For C, we could use the log.c library. For Python, we could use Python’s built-in logging library

Unit Tests

How to write unit tests is beyond the scope of this guide but, assuming that your code already has unit tests associated with it, there is a rule of thumb you should observe whenever the tests fail:

Debugging Rule of Thumb #6

When your code fails multiple tests, pick the simplest test that is failing, and fix your code to make that test pass.

Once you update your code, re-run the tests. If you pass that test, but still fail others, repeat this same process: pick the simplest test that is failing, and focus on just that test.

The reason for this is similar to Debugging Rule of Thumb #2: having your code fail lots of tests may seem overwhelming, but it is possible that many of the failures are spurious (errors that result from a previous error). Sometimes, fixing your code to pass the simplest test will suddenly make a whole bunch of other tests pass as well. So, it is important that you focus on fixing your code one test at a time.

Asking for Help

Sometimes, you will get completely stumped when trying to debug an issue, and will need to ask someone for help. Ideally, you will be able to sit down with someone and walk through the issue with them, to see if they can spot something you missed. However, it is very common to have to ask for help through a discussion forum, a chat, etc. In that case, it will be more challenging for someone to provide assistance because they can’t see exactly what you’re seeing, nor can they interactively debug the issue with you.

Below we have compiled a list of guidelines and tips on how to ask effective questions: