Case of a failed otbApplication test that freezes when MPI is on
Description
I have an OTB application tested with the test driver that freezes in MPI_Finalize()
when an exception is thrown from the MyOTBApp::DoExecute()
function.
Steps to reproduce
The test is launched with:
otbTestDriver "Execute" "/path/to/bin/otbApplicationLauncherCommandLine" "MyOtbApp" "the" "parameters" "-testenv"
If no exception is thrown, MPI_Finalize()
correctly returns.
If an (ITK) exception interrupts TheApp::DoExecute()
the process hangs in
(gdb) bt
#0 0x00007f6af1bee56d in nanosleep () from /lib64/libc.so.6
#1 0x00007f6af1c1f404 in usleep () from /lib64/libc.so.6
#2 0x00007f6af24da7cf in ompi_mpi_finalize () from /path/to/lib/libmpi.so.20
#3 0x00007f6af6075b85 in otb::MPIConfig::terminate() () from /other/path/to/otb/lib/libOTBMPIConfig-6.7.so.1
I haven't been able to understand what (pending?) message (or whatever) is preventing MPI_FInalize()
to return. I could execute MPI_abort()
just before the return -1
(BTW why isn't it return result;
?) in otbTestMain.h
which shouldn't be a big issue given we are in a failing test. But is it the right approach? (I'm not a MPI expert)
Configuration information
I've loaded current OTB (6.7?) modules on HAL: otb/develop-python3 and otb-depends/develop-python3
Other remarks
The use of the MPI singleton seems convoluted in the test. I would have worked with something like
// mpi_helper.h
namespace mpi {
class Singleton : boost::noncopyable
{
public:
static Singleton& create(int argc, char** argv) {
assert(! ms_instance);
ms_instance = make_unique<Singleton>(argc, argv);
return *ms_instance;
}
static Singleton& instance() {
assert(ms_instance);
return * ms_instance;
}
Singleton(int argc, char** argv) { .... }
~Singleton() { mpi_abort or mpi_terminate depending on exceptions and so on...}
static void abort() { ... } // needs to clear the singleton as weel
void barrier() { ... }
// do we really need an explicit terminate? if yes, it should be static as well to clear the singleton
private:
std::unique_ptr<Singleton> ms_instance;
};
}
// otbTestDriver.h
...
int main(int argc, char** argv) {
#if defined(USE_MPI)
mpi::Singleton::create(argc, argv);
#endif
...
if (result != EXIT_SUCCESS) {
#if defined(USE_MPI)
mpi::Singleton::abort(result)
#else
return result;
#endif
}
....
} // no need to explicitly call terminate