At some point while developing a MOOSE-based application you will probably need to use a debugger. A debugger will allow you to stop your program at certain points (or if things happen such as a memory segfault). Once stopped you can then carefully step through the program, inspecting variable values as you go in order to find the source of the problem.
In particular, if you ever see a "Segfault" or a "Signal 11" that means it's time to pull out the debugger. Any debugger will automatically stop once a segfault is reached, showing you exactly where the invalid memory access occurred.
For a good tutorial on debugging see Example 21
The first step to debugging anything is to build a debug executable. By default MOOSE-based applications are built in "optimized" (
opt) mode. That ensures the fastest solves. However, an optimized executable is missing a lot of information that is useful to a debugger and the optimization process itself can cause code to get reordered (or even skipped!) making it difficult to step through a program.
To build an executable suitable for debugging you need to set the
METHOD environment variable to
dbg. You can
export it in your environment but it's usually simpler to use a UNIX shortcut that allows you to define environment variables at the same time you run a command, like so:
METHOD=dbg make -j 8
Always remember that the
8 should be modified to reflect the number of processors you want to use for the build (usually the number of cores in your computer).
Once the build is complete you should end up with a "debug executable" that look like:
yourapp-dbg. That executable is perfect for loading into a debugger. However, that executable will run VERY slowly - so make sure that before you begin debugging you come up with a problem that is as small as possible but still shows the problem you're trying to fix.
Many different debuggers exist:
ddd, Totalview and Intel Debugger are just a few. While a command-line debugger (like
gdb) might seem daunting at first, they are an invaluable tool for quick debugging and debugging in complicated scenarios such as when you're running on a cluster. Learning one should be essential to any computational scientist.
For debugging MOOSE-based applications we recommend
lldb if you're using the clang compiler (default on Mac OSX) and
gdb for the
gcc compiler (default on Linux).
LLDB and GDB
gdb are very similar. They work on the "command-line" taking text input and moving through your program as it executes.
With our MOOSE package on Mac OSX you actually need to run lldb using
sudo so it has the elevated privileges it needs to attach to your program. To invoke
lldb with a MOOSE-based application on Mac OSX you would do:
sudo lldb -- ./yourapp-dbg -i inputfile.i
-- tells lldb that any command-line options after that point need to be passed to the executable you are running.
sudo will ask you for your password so that it can elevate the priveleges of
lldb. You can also look below to see how to allow
lldb to run with sudo without using a password.
gdb can be run with a similar command:
gdb --args ./yourapp-dbg -i inputfile.i
(On Linux this will generally work, without the need for
Once this is done, your executable will be loaded but won't start running. This is an opportune time to set breakpoints. We usually recommmend setting a breakpoint on
MPI_Abort using the
Then to start running your executable use the
r command (type
r then hit enter).
If a breakpoint (or fault) is reached: I recommend first using the
bt command to output a "back-trace" so you can see exactly what the current call-stack looks like to figure out where you are.
To quit the debugger use
Ctrl+d (by the way: that's a normal way to quit lots of command-line interpreters on Unix - including Python).
Another useful command is
p (for 'print') which allows you to print out the value of a variable. Just do:
You can learn about the full set of commands by using
help or looking at any number of tutorials online.
Firstly, if you don't have to debug in parallel DON'T! Only do parallel debugging when you have a problem that can only be reproduced when running in parallel. If your problem will show up in serial, it is MUCH easier to debug in serial. If it takes a long time for your problem to run so you are wanting to run it in parallel: don't do that... instead, try to make your problem smaller so that you can debug it in serial.
With all of that said: if you actually do need to debug in parallel, MOOSE has a couple of command-line arguments to help. However, before we get there, we need to do a bit of setup on Mac OSX:
Mac OSX Parallel Debugging Setup
As noted above, when running
lldb on Mac OSX we need to run with
sudo to give it permission to attach to our running program.
sudo will ask for your password, but unfortunately when doing parallel debugging there is no way to enter that password. Therefore, we need to make it so that you can run
sudo without a password.
First thing is to get the full path to where
lldb is using the command-line:
If you are using our package on OSX this should return something like
/opt/moose/llvm-5.0.1/bin/lldb. Make note of this location (copy it, or set that Terminal aside) because you'll need it in the next step.
lldb to be able to run with
sudo without a password we need to modify the
sudoers file. To do that issue this command in a terminal (preferably a new one so you can still see the path to
lldb in the first one):
Most-likely this is going to use
vi (a text editor) to open the
sudoers file (I say "most-likely" because it technically can open any editor based on the
EDITOR environment variable). If you are unfamiliar with
vi don't worry. Just press
i to go into "insert" mode. Navigate to the bottom of the file and add a line that looks like:
username ALL= NOPASSWD: /opt/moose/llvm-5.0.1/bin/lldb
username MUST be replaced with your username! And the path to
lldb needs to reflect what you got back from
which lldb above. Once that line is in place press
Esc (to exit "insert mode") then type
:wq (that's a
q - it's a command that says "write and quit") and press
Once you've completed that you should be able to run
sudo lldb on the command-line and not need to enter your password. If you still need to enter your password, email
moose-users so we can figure out what's wrong before you go further.
Actually Parallel Debugging
With all of that setup we are ready to debug in parallel. There are two options:
1. Launch a terminal for each MPI process
What we're going to do is launch a terminal (using
xterm) for each MPI process. That terminal will run our debugger and attach to the running MPI process. For this to work, you either need to be on your local box or have X-forwarding set up over SSH (which I'm not going to go into here).
Let's assume that you're working on your local Mac workstation using our package. In order to launch your program with 4 MPI processes and 4 terminals for debugging you would do:
mpiexec -n 4 ./yourapp-dbg -i inputfile.i --start-in-debugger='sudo lldb'
(If you are on Linux - most-likely you will want to put
sudo lldb is)
If everything is setup correctly you should see 4 XTerm windows show up with
lldb command-line prompts. Those debugger prompts are already attached to your running executable, but the executable is paused. This is an opportune time to set breakpoints, but you have to do it in each terminal seperately. For instance, you might want to go through each one and do:
to be able to stop if MOOSE encounters an error.
Once you are ready to continue - you do just that. Use the
c command (type
c and hit
Enter) in each terminal window to tell it to continue. Once you go through all of the open terminal windows you should see your application start to run in your original terminal. If a breakpoint (or any fault) is reached on any process that terminal window will show the command-prompt again, allowing you to inspect variables, etc.
Passing debugger commands on the command line
lldb accepts debugger commands through the
-o command line option that are executed as soon as the execuatble is loaded up and ready. This can be used to set breakpoints and immediately resume the execution of the app.
mpirun -n 4 ./yourapp-dbg -i inputfile.i --start-in-debugger "sudo lldb -o 'break set -E C++' -o cont"
The above command will set breakpoints that halt on thrown exceptions. Replace
break set -E C++ with
b MPI_Abort to break on MOOSE errors. The
-o cont option will automatically run the app after the breakpoints are set.
2. Launch your application and tell it to wait so you can manually attach a debugger
This is going to be used in cases where you need to debug using LOTS of MPI processes, but you don't want a terminal window for each one. This is also handy if you're working on a cluster that doesn't have any way of doing X-forwarding for option #1 to work. The idea is to launch your application and have it wait during the initialization phase so you have time to attach a debugger manually to one of the processes.
To do this simply run your application like so:
mpiexec -n 4 ./yourapp-dbg -i inputfile.i --stop-for-debugger
This will cause your application to print something like the following and then wait 30 seconds (by default - if you need more time use
75 is the number of seconds you want it to wait):
> mpiexec -n 4 ../../../moose_test-opt -i simple_diffusion.i --stop-for-debugger=2 Stopping for 2 seconds to allow attachment from a debugger. All of the processes you can connect to: rank - hostname - pid 0 - dereksmacpro.local - 53403 1 - dereksmacpro.local - 53404 2 - dereksmacpro.local - 53405 3 - dereksmacpro.local - 53406 Waiting...
The message there is telling you where each process is running and what its "process ID" (
pid) is. That is the relevant information you need to be able to attach a debugger to that process. In this case (on my local Mac) if I want to connect to "rank 2", in a separate Terminal I would do:
sudo lldb -p 53405
gdb has a similar mechanism, see its docs)
That will launch
lldb and attach to my running program. Attaching to the program "pauses" it - allowing me to set breakpoints and then use the
c command to tell it to continue.