Troubleshooting
The content you find here, is a collection of issues commonly experienced by all of us at some point. Please use the navigation list on the right, to begin with the section you are experiencing issues with. This document has been design in such a manner that you will 'jump' between sections pertinent to fixing the previous one.
Mailing List
If your issue can not be solved here, please submit your question to our mailing list for help!
Modules
Modules allow users to control what libraries and binaries are being made available within that terminal session. It is worth mentioning that module commands, only affect the terminal they are in. It is not global. This is why we routinely ask users to operate in a single terminal while troubleshooting issues.
Users who have installed one of our moose-environment packages, will have access to modules. Please familiarize yourself with some commonly used module commands:
Command | Command Arg | Usage |
---|---|---|
module | list | List currently loaded modules |
module | avail | List available modules |
module | load <module module module> | Load a space separated list of modules |
module | purge | Remove all loaded modules |
To begin working with modules manually, it is best to start clean (especially for the duration of this FAQ). You can do so by purging any current modules loaded:
module purge
Load the two default modules pertaining to your operating system:
Linux:
module load moose-dev-gcc moose-tools
Macintosh:
module load moose-dev-clang moose-tools
Loading these two modules will in-turn load other necessary modules. The correct modules that should be loaded will approximately resemble the following list:
Linux:
module list Currently Loaded Modulefiles: 1) moose/.gcc-9.2.0 5) moose/.cppunit-1.12.1-gcc-9.2.0 2) moose/.mpich-3.3-gcc-9.2.0 6) moose-dev-gcc 3) moose/.petsc-3.11.4-mpich-3.2_gcc-9.2.0-opt 7) miniconda 4) moose/.tbb-2018_U3 8) moose-tools
Macintosh:
module list Currently Loaded Modulefiles: 1) moose/.gcc-9.2.0 6) moose/.cppunit-1.12.1-clang-9.0.0 2) moose/.clang-9.0.0 7) moose-dev-clang 3) moose/.mpich-3.3-clang-9.0.0 8) miniconda 4) moose/.petsc-3.11.4-mpich-3.3-clang-9.0.0-opt 9) moose-tools 5) moose/.tbb-2018_U3
If your terminal mirrors the above (version numbers may vary slightly), then you have a proper environment. Please return from whence you came, and continue troubleshooting.
If you find yourself looping through our troubleshooting guide, unable to solve your issue, there is still another attempt you can perform. Start over. But this time, perform the following before starting over:
env -i bash
export PATH=/usr/bin:/usr/sbin:/sbin
source /opt/moose/Modules/4.3.1/init/bash
These three commands will start a new command interpreter without any of your default environment. This is important because for most errors we end up solving, it was due to something in the users environment.
Do note, if this ends up solving your issue, then there is something in one of possibly many bash profiles getting in the way. At this point, you will want to reach out to our mailing list and ask for help tracking this down. Keep in mind, depending on the situation you may be asked to contact the administrators of the machine in which you are operating on (HPC clusters for example are beyound our control).
The modules contained in the moose-environment package are built in a hierarchal directory structure (some modules may not be visible until other modules are loaded).
Compiling libMesh
Compiling libMesh requires a proper environment. Lets verify a few things before attempting to build it (or possibly re-build it in your case):
Verify you have a proper compiler present:
Linux:
which $CC /opt/moose/mpich-3.3/gcc-9.2.0/bin/mpicc mpicc -show gcc -I/opt/moose/mpich-3.3/gcc-9.2.0/include -L/opt/moose/mpich-3.3/gcc-9.2.0/lib -Wl,-rpath -Wl,/opt/moose/mpich-3.3/gcc-9.2.0/lib -Wl,--enable-new-dtags -lmpi which gcc /opt/moose/gcc-9.2.0/bin/gcc
Macintosh:
which $CC /opt/moose/mpich-3.3/clang-9.0.0/bin/mpicc mpicc -show clang -Wl,-commons,use_dylibs -I/opt/moose/mpich-3.3/clang-9.0.0/include -L/opt/moose/mpich-3.3/clang-9.0.0/lib -lmpi -lpmpi which clang /opt/moose/llvm-9.0.0/bin/clang
What you are looking for is that
which
andmpicc -show
are returning proper paths. If these paths are not set, orwhich
is not returning anything, see Modules for help on setting up a proper environment. Once set up, return here and verify the above commands return the proper messages.Check that PETSC_DIR is set and does exist:
Linux:
echo $PETSC_DIR /opt/moose/petsc-3.11.4/mpich-__MPICH___gcc-9.2.0-opt file $PETSC_DIR /opt/moose/petsc-3.11.4/mpich-__MPICH___gcc-9.2.0-opt: directory
Macintosh:
echo $PETSC_DIR /opt/moose/petsc-3.11.4/mpich-__MPICH___clang-9.0.0-opt file $PETSC_DIR /opt/moose/petsc-3.11.4/mpich-__MPICH___clang-9.0.0-opt: directory
If
echo $PETSC_DIR
returns nothing, this would indicate your environment is not complete. See Modules for help on setting up a proper environment. Once set up, return here and verify the above commands return the proper messages.If
file $PETSC_DIR
returns an error (possible if you are performing a Manual Install), it would appear you have not yet ran configure. Configure builds this directory.
With the above all taken care of, try to build libMesh:
cd moose/scripts ./update_and_rebuild_libmesh.sh
If you encounter errors during this step, we would like to hear from you! Please seek help on our mailing list. Provide the diagnostic and libmesh configure logs. Those two files can be found in the following locations:
moose/libmesh/build/config.log
moose/scripts/libmesh_diagnostic.log
If libMesh built successfully, return to the beginning of the step that lead you here, and try that step again.
Build Issues
Build issues are normally caused by an invalid environment, or perhaps an update to your repository occurred, and you now have a mismatch between MOOSE and your application, or a combination of one or the other with libMesh's submodule.
Verify you have a functional Modules environment.
Verify the MOOSE repository is up to date, with the correct vetted version of libMesh:
warningBefore performing the following commands, be sure you have committed your work. Because... we are about to delete stuff!
cd moose git checkout master git clean -xfd <output snipped> git fetch upstream git pull git submodule update --init
Verify you either have no moose directory set, or it is set correctly.
[~] > echo $MOOSE_DIR [~] >
The above should return nothing, or, it should point to the correct moose repository.
noteMost users, do not use or set MOOSE_DIR. If the above command returns something, and you are not sure why, just unset it:
unset MOOSE_DIR
Try building a simple hello world example (copy and paste the entire box):
cd /tmp cat << EOF > hello.C #include <mpi.h> #include <stdio.h> int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of processes int world_size; MPI_Comm_size(MPI_COMM_WORLD, &world_size); // Get the rank of the process int world_rank; MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); // Get the name of the processor char processor_name[MPI_MAX_PROCESSOR_NAME]; int name_len; MPI_Get_processor_name(processor_name, &name_len); // Print off a hello world message printf("Hello world from processor %s, rank %d out of %d processors\n", processor_name, world_rank, world_size); // Finalize the MPI environment. MPI_Finalize(); } EOF mpicxx -fopenmp hello.C
If the build failed, and you have the correct Modules environment loaded, then you should attempt to perform the 'Uggh! None of this is working' step in the Modules section. As it would seem, there is something else in your environment that is inhibiting your ability to compile simple programs.
If the build was successfull, attempt to execute the hello word example:
mpiexec -n 4 /tmp/a.out
You should receive a response similar to the following:
Hello world from processor my_hostname, rank 0 out of 4 processors Hello world from processor my_hostname, rank 1 out of 4 processors Hello world from processor my_hostname, rank 3 out of 4 processors Hello world from processor my_hostname, rank 2 out of 4 processors
If all of the above has succeeded, you should attempt to rebuild libMesh again.
Failing Tests
If many, or all tests are failing, it is a good chance the fix is simple. Follow through these steps to narrow down the possible cause.
First, run a test that should always pass:
cd moose/test
make -j 8
./run_tests -i always_ok -p 2
If make -j 8
fails, please proceed to Build Issues above. This may also be the reason why all your tests are failing.
This test, proves the TestHarness is available. That libMesh is built, and the TestHarness has a working MOOSE framework available to it. Meaning, your test that is failing may be beyond the scope of this troubleshooting guide. However, do continue to read through the bolded situations below. If the error is not listed, please submit your failed test results to our mailing list for help.
If the test did fail, chances are your test and our test is failing for the same reason:
Environment Variables is somehow instructing the TestHarness to use improper paths. Try each of the following and re-run your test again. You may find you receive a different error each time. Simply continue troubleshooting using that new error, and work your way down. If the error is not listed here, then it is time to ask the mailing list for help:
check if
echo $METHOD
returns anything. If it does, try unsetting it withunset METHOD
If this was set to anything other than
opt
, it will be necessary to rebuild moose/test again:cd moose/test make -j 8
check if
echo $MOOSE_DIR
returns anything. If it does, try unsetting it withunset MOOSE_DIR
If this was set to anything, you must rebuild libMesh.
check if
echo $PYTHONPATH
returns anything. If it does, try unsetting it withunset PYTHONPATH
Failed to import hit:
Verify you have the miniconda package loaded. See Modules
If it was not loaded, and now it is, it may be necessary to re-build moose:
cd moose/test make -j 8
No Modulefiles Currently Loaded
Verify you have modules loaded. See Modules
Application not found
Your Application has not yet been built. You need to successfully perform a
make
. If make is failing, please see Build Issues above.Perhaps you have specified invalid arguments to run_tests? See TestHarness More Options. Specifically for help with:
--opt
--dbg
--oprof
gethostbyname failed, localhost (errno 3)
This is a fairly common occurrence which happens when your internal network stack / route, is not correctly configured for the local loopback device. Thankfully, there is an easy fix:
Obtain your hostname:
hostname mycoolname
Linux & Macintosh : Add the results of
hostname
to your/etc/hosts
file. Like so:sudo vi /etc/hosts 127.0.0.1 localhost # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters 127.0.0.1 mycoolname # <--- add this line to the end of your hosts file
Everyones host file is different. But the results of adding the necessary line described above will be the same.
Macintosh only, 2nd method:
sudo scutil --set HostName mycoolname
We have received reports where this method sometimes does not work.
TIMEOUT
If your tests fail due to timeout errors, its most likely you have a good installation, but a slow machine (or slow filesystem). You can adjust the amount of time that the TestHarness allows a test to run to completion by adding a paramater to your test file:
[Tests] [./timeout] type = RunApp input = my_input_file.i max_time = 300 <-- time in seconds before a timeout occurs . 300 is the default for all tests. [../] []
CRASH
A crash indicates the TestHarness executed your application (correctly), but then your application exited with a non-zero return code. See Build Issues above for a possible solution.
EXODIFF
An exodiff indicates the TestHarness executed your application, and your application exited correctly. However, the generated results differs from the supplied gold file. If this test passes on some machines, and fails on others, this would indicate you may have applied too tight a tolerance to the acceptable error values for that specific machine. We call this phenomena machine noise.
CSVDIFF
A different file format following the same error checking paradigm as an exodiff test.
MacOS Catalina Caveats
GCC is not completely functional.
There appears to be an issue/bug with Xcode, which prevents many GNU sources from compiling while not using the Xcode IDE. However, there is a work-around. But unfortunately it requires some know-how on your part. The
stdlib.h
header file located at/usr/include
within the basepath ofxcrun --show-sdk-path
, needs to be patched. The following represents the required change:diff --git a/usr/include/stdlib.h b/usr/include/stdlib.h index 035e6c0..035bb92 100644 --- a/usr/include/stdlib.h +++ b/usr/include/stdlib.h @@ -58,8 +58,8 @@ #ifndef _STDLIB_H_ #define _STDLIB_H_ -#include <Availability.h> #include <sys/cdefs.h> +#include <Availability.h> #include <_types.h> #if !defined(_ANSI_SOURCE)
Basically, we want to swap the includes, so that
#include <sys/cdefs.h>
is included before#include <Availability.h>
. Keep in mind, this change will need to be executed each time Catalina goes through an Xcode update. At the time of this writing, the latest Xcode (11.2) is affected.