\page cs_dg_debugging Debugging
[TOC]
All non-trivial software has bugs, at least at first, so debugging is a part of code development.
-
Is is often easier to debug newly written code than older code
-
At least when done by the same person
-
Test and debug early, before you forget the details of your code
-
-
Multiple tools and techniques are available
- Mastering debugging tools and techniques make debugging less painful
-
Bugs detected late are much more costly then those detected early
-
May require re-validation
-
Often harder to debug, because the code details need to be ``re-learned''
-
Be proactive, try to detect as many bugs as possible by testing
-
When encountering or suspecting a bug, choosing the best debugging technique for the situation can lead to one or more orders of magnitude in time savings.
-
Choosing which tool to use is the difficult part, and can be based on several factors:
-
Comparison of experience to observed effects
-
Leveraging theoretical knowledge and experience
- Guessing at whether bugs may be due to uninitialized values, out-of-bounds arrays accesses, bad option settings, numerical errors, ...
-
Sometimes, a bit of luck...
-
-
code_saturne tries to help, so always check for error messages
-
In code_saturne,
error*,run_solver.log, messages in batch output logs (or in the console) should be checked.- For parallel runs, when both,
erroranderror_r*are present, the latter are the ones which contain the useful information.
- For parallel runs, when both,
-
See [section in user guide](@ref sec_ug_troubleshhoting) for more details.
-
Some graphical checks with
postprocessing/error*outputs are also available for boundary conditions and linear solvers.
-
-
Source code proofreading
-
(Re-) checking for compiler warnings.
-
Interactive debugging
-
Memory debugging
-
Checks/instrumentation in code...
-
Use
--enable-debugto configure builds for debug.- Enables use of many
assertchecks in C and C++ code.
- Enables use of many
-
-
Using recent versions of the GCC or clang compiler, compile/run with AddressSanitizer, UndefinedBehaviorSanitizer, and other tools of this series frequently.
-
Code built this way not compatible with runs under Valgrind.
-
Not compatible either with some resource limits set on some clusters.
-
Overhead: usually about x3.
-
-
-
When you known where to search, print statements may be useful...
The GNU debugger https://www.gnu.org/software/gdb is a broadly available, interactive debugger for compiled languages including C, C++, and Fortran.
-
To debug an executable, run
gdb <executable>-
Under the gdb prompt, type
helpfor built-in help,qto quit. -
Help is grouped in categories
-
The most common options are:
-
b(set breakpoint), -
c(continue to next statement), -
s(step into function), -
p(print).
-
-
-
Many front-ends are available, including:
-
A built-in user interface for terminals.
-
integration with text editors, especially
-
Emacs (built-in),
-
vim (Conque GDB, Termdebug).
-
-
Standalone graphical interfaces:
-
integration in development environments:
-
-
-
GDB can provide some information on any compiled program, but provides more detailed and useful information when the program was compiled with debugging info. The matching compiler option is usually
-g, and in the case of code_saturne, is provided using the--enable-debugconfigure option at installation.
When used directly, GDB runs in a single terminal frame, as shown here.
Only the current line of code is shown, though the
list
command allows showing more.
\image html dg/gdb_screen.png "GDB in terminal mode" width=80%
When started with the
-tui option, GDB runs
in a split terminal, with source on top, commands on bottom.
-
Using the
CTRL+x+okey combination allows changing focus from one to the other. -
Using the
CTRL+lkey allows refreshing the display. -
Using the
CTRL+x+akey combination allows switching back to the single temrminal. With recent gdb versions, it is no possible to copy or paste text from a split terminal, but is it possible from a single terminal, so switching back and forth may be useful.
\image html dg/gdb_tui_screen.png "GDB with split screen" width=80%
GDB may also be run under Emacs, which provides syntax highlighting of source code.
\image html dg/emacs_gud_screen.png "GDB under Emacs" width=55%
Checking the gdb documentation or tutorials is recommended. The most important gdb command is:
help
The most commonly used commands are:
b: set a breakpoint.c: continue.p: print a variable's value (using the current programming language syntax).up: move up the stack.down: move down the stack.
A running program may be paused using CTRL+C.
Variables may not only be printed, but also modified interactively, and functions may be called, making it easy to test some hypothesis without restarting the whole program.
Many graphical front-ends are available for gdb. When evaluating a front-end, we recommend to check for the following features:
-
Must provide a console to allow combining text-based commands with the graphical elements, or at least easily-accessible widgets in which watchpoints and expressions to print can be typed.
-
Should allow some means (such as command-line options) to connect to a GDB server through a socket interface (more on this later)
The DDD (Data Display Debugger) front-end is obsolete and uses a dated graphical toolkit, but has the advantage of combining a command prompt with graphical tools, and is very easy to use, so it might remain an option.
The gdbgui debugger seems promising, and a good potential successor to DDD. It is based on a web-browser interface,
\image html dg/gdbgui_screen.png "gdbgui" width=80%
Full integrated development environments (including Qt Creator, Visual Studio Code, Eclipse, Kdevelop, Anjuta) are outside the scope of this documentation. Most members of the code_saturne development team mostly use lighter, less integrated tools, so will not be able to provide recommendations regarding their use.
The Eclipse CDT and Eclipse PTP (Parallel Tools Platform) environments integrate debuggers, including a parallel debugger, but may use a different syntax than "standalone" GDB, so they are not considered here (though feedback and recommendations around these tools are welcome).
The LLDB debugger is an interesting competitor to GDB, with a different (similar but more verbose) syntax. Is is not as widely available yet, and is not yet handled by the code_saturne debug scripts, though a user familiar with it could of course set it up.
The Valgrind tool suite allows the detection of many memory management (and other) bugs.
-
Dynamic instrumentation
-
No need for recompilation
-
Usable with any binary, but provides more info (i.e. code line numbers) with code compiled in debug mode
-
Depending on tool used, run time and memory overhead from 10-100x.
-
With default tool (Memcheck), 10x30.
-
Use proactively, to detect bugs on small cases, before they become a problem in production cases.
-
-
-
Valgrind is easy to run:
-
Prefix a standard command with
valgrind-
By default, uses the
memchecktool. -
Tool may be changed using
valgrind –tool=/cachegrind/callgrind/drd/massif/...
-
-
Valgrind may be combined with GDB using its
gdbservermode.-
To use this mode, call
valgrind –vgdb-error=<number>- The number represents the number of errors after which the gdbserver is invoked (0 to start immediately).
-
\image html dg/valgrind_screen.png "Valgrind in a terminal" width=60%
Recent versions of the LLVM clang and GCC compilers have additional instrumentation options, allowing memory debugging with a lower overhead than Valgrind.
For the most common errors, use AddressSanitizer, a fast memory error detector.
-
For the code_saturne configure options, this means adding
CFLAGS=-fsanitize=address \
CXXFLAGS=-fsanitize=address \
FCFLAGS=-fsanitize=address \
LDFLAGS=-fsanitize=address -
This may sometimes require specifying
export LD_LIBRARY_FLAGS=<path_to_compiler_librarieswhen the compiler is installed in a nonstandard path on older systems. -
On some machines, this may be unusable if memory resource limits are set (check using
ulimit -c -
Note that the resulting code will not be usable under Valgrind.
-
Uninitialized values are not detected by AddressSanitizer (but may be detected by UndefinedBehaviorSanitizer).
-
Out-of-bounds errors for arrays on stack (fixed size, usually small) are not detected by Valgrind, but may be detected by AddressSanitizer.
-
AddressSanitizer also includes a memory leak checker, which is useful but may also report errors due to system libraries, so to allow a "clean" exit, we may use:
export ASAN_OPTIONS=detect_leaks=0
The UndefinedBehaviorSanitizer instrumentation is also useful to detect other types of bugs, such as division by zero, some memory errors, integer overflows, and more.
-
This may sometimes require also specifying
-lubsanand even in some cases specifyLD_LIBRARY_FLAGS -
For the code_saturne configure options, this means adding
CFLAGS=-fsanitize=undefined \
FCFLAGS=-fsanitize=undefined \
LDFLAGS=-fsanitize=undefined -
This may sometimes require specifying
export LD_LIBRARY_FLAGS=<path_to_compiler_librariesas per AddressSanitizer. -
Note that only code compiled with those options is instrumented.
Several ways of setting code_saturne to run under a debugger are possible:
-
Using the GUI, set options in Run computation/Advanced options`
-
The associated help provides several examples
-
This sets a
debug_argsoption under the current resources section in the case'srun.cfgfile, which can also be edited directly.
-
-
The same options can be provided directly to
code_saturne runusing the--debug-argsoption
\image html dg/debug_wrapper.png "Example of use of debugger wrapper" width=60%
When the code is run, the debugger will then be launched automatically.
To run the execution under a debugger, a string with the following syntax structure should be used:
<debugger> [debugger_options] [valgrind [[valgrind_options]]
Or, for Valgrind only:
<valgrind> [valgrind options]
where < > denote required arguments, and [ ] optional arguments.
The following debuggers and user interfaces are handled: gdb (GNU Debugger), cuda-gdb, cgdb (console-front-end to gdb), gdbgui (browser-based frontend to gdb), ddd (Data Display Debugger), emacs (as gdb front-end), kdbg (KDbg), kdevelop (KDE developement environment), gede (simple Qt-based gdb GUI), nemiver (GNOME Nemiver debugger), valgrind (Valgrind tools for memory debugging).
If a debugger may not be found in the PATH, an absolute path should be given instead.
The following debug wrapper options are handled:
| `--asan-bp` | adds a breakpoint for gcc's Address-Sanitizer |
| `--back-end=gdb` | path to debugger back-end (for graphical front-ends) |
| `--breakpoints=LIST` | comma-separated list of breakpoints to insert |
| `--ranks=LIST` | comma-separated list of MPI ranks to debug |
| `--terminal` | terminal type to use for console debugger |
Other, standard options specific to each debugger may also be used, as long as they do not conflict with options in this wrapper
To combine a Valgrind tool with another debugger, Valgrind's --vgdb-error=<num>
option should be used, where <num> is the number of errors after which Valgrind's
gdb server should be invoked. This mode is only compatible with the following
debuggers: gdb, gdbgui, ddd, emacs, as the other debugger front-ends do not provide
the required startup options to connect with the gdb server.
Note that compared to a standalone use of the gdb debugger, running under the debug wrapper automatically sets breakpoints at program start and end and launch the program.
To define commands such as setting breakpoints, a small gdb script with
the .gdb extension is generated for each rank. It can be removed safely
after program start.
When running in parallel, several debugging windows will be opened if necessary.
To simply run using the gdb debugger:
gdb
To run Valgrind's gdb server, stopping at the second error in the terminal:
--terminal=gnome-terminal gdb valgrind --vgdb-error=1
To run Valgrind's gdb server, stopping at the second error:
gdb valgrind --vgdb-error=1
Do do the same using the ddd front-end:
ddd valgrind --vgdb-error=1
To run under gdb with preset breakpoints at bft_error and exit functions:
gdb --asan-bp --breakpoints=bft_error,exit
To run under gdb with a breakpoint for ubsan (UndefinedBehaviorSanitizer):
gdb --asan-bp --breakpoints=__ubsan::Diag::~Diag
To debug under gdbgui:
gdbgui
To run under Valgrind's default tool (Memcheck), with a user Valgrind build
<path_to_valgrind> --tool=massif
To run under Valgrind's Massif heap profiler:
valgrind --tool=massif
To allow for debugging parallel runs and combining GDB and Valgrind, GDB is run under a new terminal.
-
The type of terminal chosen can by defined using the
--terminaloption.- Known issue: on some Debian 10-based systems, running under
gnome-terminalcrashes GDB. Running under the defaultxtermorkonsoleworks fine.
- Known issue: on some Debian 10-based systems, running under
By default, xterm will be used.
This usually leads to very small, hard to read fonts. This can be fixed
by editing $HOME/.Xresources
such as in the following example:
!xterm*font: *-fixed-*-*-*-18-*
xterm*faceName: Liberation Mono:size=10:antialias=false
xterm*font: 7x13
xterm*VT100.geometry: 120x60
URxvt*geometry: 120x60
URxvt.font: xft:Terminus:antialias=false:size=10
Starting the debugger manually in an execution directory avoids creating many directories and waiting for pre-processing before each run.
-
cdto the run directory underRESU/<run_id>. -
To determine the code options already configured, run
cat run_solverto view the execution commands. -
Add the debugger commands to this to run (unless already done through the GUI or user script).
-
To make this easier, code_saturne provides a
cs_debug_wrapper.pyscript, in thepython/code_saturne/basedirectory of the source tree (and in thelib/python<version>/site-packages/code_saturnedirectory of an installed build). -
Run
cs_debug_wrapper.py --helpfor instructions.
-
-
The XML file may be modified directly using
code_saturne gui <file>(ignoring the directory warning).- If mathematical expressions are modified, and additional step is required. in this case, it is simpler to generate a new run.
-
When modifying user-defined functions, do not forget to run
code_saturne compile -s src_saturneto update thecs_solverexecutable.
The code_saturne debug wrapper does not yet support launching GDB under Vim or Neovim.
Various examples of use of debugging with Vim are found on the web, explaining how Termdebug for example can be used.
Support for these tools could be added if users can provide a command-line example for launching a debugging session with their favorite editor and help test this.
Debugging parallel code_saturne runs is not very different from debugging serial runs, though stepping through a program may be more cumbersome unless a parallel debugger is used.
If a true parallel debugger such as TotalView or Arm is available, the following procedure may be used:
-
First, initialize the execution directory, using one of the following methods:
- From the GUI, in the advanced run/job submission options, check "initialize only", then submit the computation.
- Outside the GUI, run
code_saturne submit --initializeIn either case, the code will prepare the execution directory, and preprocess the mesh if needed, but not remove the executable and temporary script.:
-
Once the stage has finished,
cdto the execution directory, and edit therun_solverscript script:- Add commands necessary to load the debugger's environment. For example, on the EDF Cronos cluster, this requires adding
module load arm-forge
in the section where other modules are added.
- Search for the line actually launching the solver, near the end of the script;
before the
cs_solvercommand, insert the debugger command. For example, for the DDT debugger, replace
mpiexec <options> ./cs_solver
with:
mpiexec ddt <options> ./cs_solver
-
If using a batch system, request an interactive allocation (for example, using
sallocwith SLURM). ** This step is important: without this, you may be trying to run a large job on a front-end node, and your cluster administrators will not be happy.** -
You can then run:
./run_solver
which will launch the code under the interactive debugger.
Debuggers such as DDT allows to easily switch between global and local stepping through the program and exploring its data, using a single window, as illustrated in this screenshot:
\image html dg/ddt_pause.png "Global DDT pause" width=80%
Remarks:
-
On some clusters using
srunas the configuredmpiexeccommand, thesruncommand configuration may not be compatible with DDT. In this case, switch to another appropriatempiexeccommand. -
In case launching the debugger fails, an alternative option is to simply replace the whole launch command (the line containing
cs_solver) with the command launching the debugger's GUI, for exampleddtAnd then simply follow the debugger prompts for command launch options, as shown in the example below. \image html dg/ddt_launch.png "Example DDT launch options" width=80%
When no true parallel debugger is available, serial debuggers may be used.
-
Usually one for each process, though using multiple program features allows running only selected ranks under a debugger.
- For example:
mpiexec -n 2 <program> : - n 1 <debug_wrapper> <program> : -n 3 <program>to debug rank 2 of 6
- For example:
-
The execution may not be restarted from the debugger; the whole parallel run must be restarted.
-
Very painful if not automated.
-
This is where the
cs_debug_wrapper.pyscript really becomes useful.- This script also includes a
--ranksfilter option so as to call the debugger only on selected ranks. For example, using--ranks=2,5will launch MPI ranks 2 and 5 under a debugger, while other ranks will be run normally.
- This script also includes a
-
For code_saturne under GDB, to determine a given process's rank, type:
print cs_glob_rank_id
Debugging OpenMP data races is much more tricky.
-
Most errors are due to missing
privateattributes in OpenMP pragmas.-
Using local variable declarations avoids most of these, as those variables are automatically thread-private.
-
Using
parallel_forconstructs fromcs_dispatchavoid even more thread race issues, as most local variables captured in lambda function will be read-only, so for example simply incrementing a counter defined outside the loop (instead of using a proper reduction construct) will lead to a compilation error. -
Valgrind's DRD (Data Race Detector) tool is not of much use here, as it generates too many false positives with current OpenMP implementations (even when disabling the use of Linux futexes with a dedicated GCC build).
-
ThreadSanitizer seems to work best here. For proper OpenMP debugging, LLVM-based compilers also require the
archerlibrary, which is bundled with the Intel oneAPI compilers (versions 2024.0 and above), and clang versions 10 or above. Note that on some systems with fine-grained packaging, such as Debian, clang might be available without OpenMP support (which requireslibompto be installed.Note we have successfully used ThreadSanitizer with the oneAPI compilers (tested with oneAPI 2025.2). Builds with standard clang 19 and 20 seem to fail on startup, and those with gcc 12 report false positives and then hang. So using the oneAPI compilers is recommended for OpenMP debugging.
-
For the code_saturne configure options, this means using
CFLAGS=-fsanitize=thread \
CXXFLAGS=-fsanitize=thread \
LDFLAGS=-fsanitize=thread -
At runtime, one should also use set the following option to avoid false positives.
export TSAN_OPTIONS='ignore_noninstrumented_modules=1'
-
-
When having or suspecting issues with loading or selection of shared libraries, use
export LD_DEBUG=libs
before running the code from a terminal. This will log many operations of the dynamic loader.
To obtain more info on available options, use LD_DEBUG=help with any
program, for example:
LD_DEBUG=help cat