Debugging

Programs are written by humans, and humans make mistakes. Software errors are not entirely inevitable. The main tools we have to prevent them are defensive programming and testing.

Software bugs fall into a few broad categories, and understanding these helps us reason about them. The difficulty of finding a bug is related to its category. The three classes of bugs are failure to compile, run-time crash, and unexpected behavior. The longer it takes to detect a fault, the more cost it is to fix them.

Failures to compile are the best type of error you can get and usually is due to a syntactic mistake or simple oversight.

Run-time crashes occur when the executable is run. These are much harder to deal with than compilation errors but still relatively simple.

Unexpected behavior is the nastiest type of bug. It will usually be a minute logic problem deep in sode that executed half an hour ago.

Run-time errors can be sub-grouped into syntactic errors, build errors, basic semantic bugs, and semantic bugs.

Syntactic errors are usually caught by the compiler at build time, but sometimes language grammar errors can get through undetected. This may be mistaking a == for a =, a && for a &, forgetting a semicolon, etc. The best way to avoid these is compile with all warnings switched on.

Build errors may only manifest themselves at run time. Always distrust your build system, no matter how good you think it is. This can take time to figure out, so if you are in doubt, do a total cleanout of your project and rebuild it from scratch.

Basic semantic bugs are the majority of run-time faults and are simple errors causing incorrect behavior. Using an uninitialized variable will cause the program's behavior to depend on the garbage value in the memory location used by the variable. Other common basic semantic faults are comparing floating-point variables for equality, calculations that do not handle numerical overflow, rounding errors, and other type errors. Often this kind of semantic fault can be caught with static code analysis.

Semantic bugs are insidious errors that won't be caught by inspection tools. These can be low-levelerrors like using the wrong variable, not validating input, or an incorrect loop, and they can be high-level like calling an API incorrectly. The best of these are the repeatable ones. The worse are ones that cause memory corruptions.

The semantic bugs can then be further divided into subcategories: segmentation faults, memory overruns, memory leaks, running out of memory, math errors, and program hangs.

Segmentation faults (also known as protection faults) are caused when memory locations not allocated for the program's use are accessed. This is far too easy to do in C. A common C typo causing a segfault is:

scanf("%d", number);

The missing & before number causes scanf to try to write into the memory location referenced by the (garbage) contents of number.

Memory overruns are caused by writing past memory that has been allocated for your data structure. In an unprotected operating system, this may even tamper with data from another process or the OS itself. Use safe data structures whenever possible to avoid these errors.

Memory leaks are constant threats in languages that do not have garbage collection. Anything that you manually require must be manually released.

Program hangs are usually caused by bad program logic. Infinite loops are the most common. Deadlock and race conditions can also occur in threaded code, and event-driven code can end up waiting on events that will never occur.

It has been found that some programmers introduce far fewer faults (60 percent less), find and fix faults quicker (in 35 percent of the time), and introduce fewer faults as they do. The key to doing so is paying attention on a microscopic level of the code you write while also keeping the bigger picture in mind. The single most important rule (the golden rule of debugging) when debugging is this: use your brain.

If you have to debug some code, the first thing you should do is learn about it first. You can't expect to find errors in code that you don't understand.

Compile-time errors result in the compiler spitting out a lot of error messages: look at the first one, which should be trusted far more than the subsequent messages. Sometimes the syntax error is on the preceding line that the compiler reports.

Run-time errors must be found methodically and finding the bug is a process of confirming what you think is correct until you find the place where that condition doesn't hold. Figuring out how to reproduce it reliably is the first step. When locating the fault, a good place to start is where the error manifests itself. Divide and conquer is a good strategy to locate it. A dry run is another technique, where you would play the role of the computer, trace program execution and compare your result with reality. Once you think you find the cause, investigate it thoroughly to prove that you are right. Next, write a test case to demonstrate the failure. Now, the easy part: fix the bug.

If when trying all of this doesn't work, try explaining the problem to someone else.

Convince yourself that you have really found the root cause of the problem and are not just hiding a symptom. When you fix the bug, check to see if the same mistake is lurking in related sections of code. Think about the lessons learned from each fault. How could it have been prevented? How could it have been discovered more quickly?

The most important rule to follow to not introduce bugs: use you brain.

Article notes

What are the two main tools we have to prevent software errors according to Code Craft?

What are the three classes of bugs according to Code Craft?

What is the nastiest type of bug according to Code Craft?

What is the most common type of program hang according to Code Craft?

What type of variable, if you use it in a program, will cause the program's behavior to depend on the garbage value in the variable's memory location?

What is the first step when debugging some code?

What error subtype of run-time crashes are usually caught by the compiler, but if they are not, are usually language grammar errors?

What is a common syntactic error that is a language grammar error that might happen in a conditional?

What error subtype of run-time crash would include a Rails app using a version of a gem that was installed from local development rather than the actual released version of the gem?

An example of a basic semantic bug is using an uninitialized variable; what will the behavior of the program depend on in this case?

A common basic semantic fault related to floating point numbers is?

What subtype of semantic bug is caused when memory locations not allocated for the program's use are accessed?

What type of semantic bug is when memory is written past what has been allocated for a data structure?

What type of semantic bug includes infinite loops, deadlock and race conditions, and event-driven code waiting for an event that will never occur?

What is the key to find and fix faults quicker and also introduce less new faults as you do?

What is the golden rule of debugging according to Code Craft?

When a compile-time error results in the compiler spitting out a lot of error messages, usually the most important one to look at is:

What technique to investigate a run-time error has you play the role of the computer, trace program execution and compare your result with reality?