|
Pakman
|
As mentioned in the main page, it is likely that you will run into problems when your simulator uses MPI and you try to use it with MPIMaster directly. This error occurs because an MPI application forking another MPI application is not defined by the MPI standard and is not supported by current MPI libraries (and likely never will be).
Fortunately, there exists a mechanism native to the MPI standard for the creation of new processes. In particular, the MPI function MPI_Comm_spawn allows you to spawn a new MPI process and provides an MPI communicator to communicate with the child MPI process after it has been created.
However, in order to maintain portability, spawned MPI processes have certain limitations when it comes to process control. Firstly, there is no way to force the termination of a process spawned using MPI_Comm_spawn. When using a "standard" (i.e. a forked) simulator, Pakman can send system signals to terminate Workers before they have finished their simulations, for example when an ABC SMC generation has finished, or the requisite number of parameters have been accepted in ABC rejection. This is not possible for spawned MPI processes however, so Pakman has no choice but to wait for the simulation to finish.
Secondly, it is impossible to discard the standard error of a process created using MPI_Comm_spawn, so the flag --discard-child-stderr does not have any effect. Thirdly, spawned MPI programs should not be wrapped in a shell script because this is not defined by the MPI standard (even though some MPI libraries may support this).
Most importantly, using an MPI-based simulator with Pakman breaks Pakman's modular framework because now the communication between Pakman and the simulator has to happen through MPI instead of through standard input and output. The simulator is thus no longer a black box on the systems-level, but rather has to be implemented as C or C++ function. As a result, the protocol for communicating between Pakman and the simulator has to be compiled into the user executable.
We have implemented the Pakman–Worker communication protocol as a C header and as a C++ header in pakman_mpi_worker.h and PakmanMPIWorker.hpp, respectively. These headers define a Pakman MPI Worker. The two roles of a Pakman MPI Worker are to communicate with Pakman and to run user-defined simulations.
--mpi-simulator so that Pakman is aware of this. See this wiki page for an example.When writing an MPI simulator in C, the user needs to define a function with the following signature:
This function should perform the same tasks that a standard simulator user executable would do; given some command-line arguments and an input_string (which contains a tolerance and a parameter), run a simulation, compare the simulated data to the observed data and return an output string (containing either accept or reject). Moreover, the return value is considered the "exit code" of the simulation; a nonzero return value indicates that an error has occurred during the simulation.
After defining this function, it should be passed on as a function pointer to the function pakman_run_mpi_worker(), defined in pakman_mpi_worker.h with the signature
Hence the template for an MPI simulator written in C is as follows:
Note that the user still needs to call MPI_Init and MPI_Finalize, otherwise the MPI Worker will attempt to call MPI functions without initializing MPI.
When Pakman is invoked with an MPI simulator, it will spawn the MPI Worker once and repeatedly execute the simulator function to run simulations. When Pakman terminates, it will send a termination signal to the MPI Worker through MPI. Only then will MPI Worker terminate. This is in contrast to the standard Worker, which is forked from Pakman for each simulation.
The following MPI simulator example can be found is taken from mpi-simulator.c. It is a dummy simulator that by default simply outputs accept and exits with a zero error code. In addition, the output message and the error code can be specified with optional command-line arguments.
When writing an MPI simulator in C++, the user can use pakman_mpi_worker.h as before. However, it is also possible to use the PakmanMPIWorker class, defined in PakmanMPIWorker.hpp. This method has the advantage that it is not constrained to function pointers, but rather accepts instantiations of the std::function class. This is a class that can wrap any callable element, including function pointers. In our case, the expected std::function object is of the following type:
The arguments and return value retain the same meaning as in pakman_mpi_worker.h.
This function object should be passed to the constructor of PakmanMPIWorker. Then, to run the MPI Worker, the user needs to call PakmanMPIWorker::run() on the created object.
Assuming that the simulator function is written as a normal function, the template of an MPI simulator written with PakmanMPIWorker.hpp is as follows:
As before, the user still needs to call MPI_Init and MPI_Finalize.
The following MPI simulator example can be found is taken from mpi-simulator-cpp.cc. It is a dummy simulator that by default simply outputs accept and exits with a zero error code. Furthermore, the output message and the error code can be specified with optional command-line arguments.