############################################################################
``pycbc_make_inference_workflow``: A parameter estimation workflow generator
############################################################################
===============
Introduction
===============
The executable ``pycbc_make_inference_workflow`` is a workflow generator to
setup a parameter estimation analysis. It can be setup to run on one or more
events at once. For each event, the workflow:
#. Runs ``pycbc_inference``. If desired, you can run multiple independent
instances of ``pycbc_inference`` on the same event.
#. Extracts a posterior file using ``pycbc_inference_extract_samples``. If
multiple instances of ``pycbc_inference`` were run on the same event, the
samples from all of the runs will be combined into a single posterior file.
You can also have derived parameters written out to the posterior file.
#. Makes various posterior plots and tables. The prior is also plotted. If
you are analyzing gravitational-wave data, a plot of power spectral density
(PSD) used for each event is also created.
#. If you are working in a Python 3.x environment you can optionally have
the workflow produce a skymap for each event (this requires ``ligo.skymap``
to be installed).
#. Optionally creates sampler-dependent diagnostic plots.
#. Generates a results html page that gathers all of the results.
The workflow generator requires a configuration file that tells it what plots
to make, what parameters to produce posteriors for, which events to analyze,
and any other settings to use for the various executables that are run.
For each event, one or more inference configuration files (the file(s) passed
to ``pycbc_inference``) must also be provided. These are separate from the
workflow configuration file, as they describe how to analyze each event. You
tell the workflow how many events to analyze and which inference configuration
files to use for each event via ``[event-{label}]`` sections in the workflow
configuration file. Here, ``{label}`` is a unique label for each event.
To illustrate how to setup and use a workflow, below we provide an example
of how to setup the workflow to analyze two binary black hole events at once
-- GW150914 and GW170814.
================================================
Example: GW150914 and GW170814 with ``emcee_pt``
================================================
In this example we setup a workflow to analyze GW150914 and GW170814 using
``emcee_pt``. We will use a prior that is uniform in comoving volume and
uniform in source masses. As we will be using the ``IMRPhenomPv2`` waveform
approximant, we will use the ``marginalized_phase`` Gaussian noise model.
This workflow will produce a results page that looks like the example
`inference-gw150914_gw170814 `_.
The inference configuration files we will use can all be found in the pycbc
``examples`` directory. Below, we provide instructions on what files need
to be downloaded, and how to setup and run the workflow.
-------------------------------------
Get the inference configuration files
-------------------------------------
We need the configuration files for ``pycbc_inference``. These define the
prior, model, sampler, and data to use for each event.
**The prior:**
.. literalinclude:: ../../examples/inference/priors/bbh-uniform_comoving_volume.ini
:language: ini
:download:`Download <../../examples/inference/priors/bbh-uniform_comoving_volume.ini>`
**The model:**
.. literalinclude:: ../../examples/inference/models/marginalized_phase.ini
:language: ini
:download:`Download <../../examples/inference/models/marginalized_phase.ini>`
**The sampler:**
.. literalinclude:: ../../examples/inference/samplers/emcee_pt-srcmasses_comoving_volume.ini
:language: ini
:download:`Download <../../examples/inference/samplers/emcee_pt-srcmasses_comoving_volume.ini>`
**The data:** We also need configuration files for the data. Since GW150914
occured during O1 while GW170814 occurred during O2, we need both the standard
O1 and O2 files:
.. literalinclude:: ../../examples/inference/data/o1.ini
:language: ini
:download:`Download <../../examples/inference/data/o1.ini>`
.. literalinclude:: ../../examples/inference/data/o2.ini
:language: ini
:download:`Download <../../examples/inference/data/o2.ini>`
-------------------------------------
Setup the workflow configuration file
-------------------------------------
As discussed above, the workflow configuration file specifes what events to
analyze, what programs to run, and what settings to use for those programs.
Since the same general workflow settings can be used for different classes of
events, here we have split the workflow configuration file into two separate
files, ``events.ini`` and ``workflow_config.ini``. The former specifies what
events we are analyzing in this run, while the latter specifies all of the
other settings. As we will see below, we can simply provide these two files to
``pycbc_make_inference_workflow``'s ``--config-file`` argument; it will
automatically combine them into a single file.
The events:
.. literalinclude:: ../../examples/workflow/inference/gw150914_gw170814-emcee_pt/events.ini
:language: ini
:download:`Download <../../examples/workflow/inference/gw150914_gw170814-emcee_pt/events.ini>`
The rest of the configuration file:
.. literalinclude:: ../../examples/workflow/inference/gw150914_gw170814-emcee_pt/workflow_config.ini
:language: ini
:download:`Download <../../examples/workflow/inference/gw150914_gw170814-emcee_pt/workflow_config.ini>`
**Notes**:
* Since the ``[executables]`` section contains entries for
``create_fits_file`` and ``plot_skymap``, the workflow will try to create
sky maps. **This requires a Python 3.x environment and** ``ligo.skymap``
**to be installed.** If you have not installed ``ligo.skymap`` yet, do so by
running::
pip install ligo.skymap
* If you do not want to create sky maps, or are running a Python 2.7
environment, you can turn this off by simply commenting out or removing
``create_fits_file`` and ``plot_skymap`` from the ``[executables]`` section.
* The number of cores that will be used by ``pycbc_inference`` is set by the
``nprocesses`` argument in the ``[inference]`` section. You should set this
to the number of cores you expect to be able to get on your cluster. In the
configuration presented here, we are limited to shared memory cores. (It
is possible to run using MPI in order to parallelize over a larger number
of cores, but that requires special condor settings that must be implemented
by your cluster admins. That is outside the scope of these instructions.)
* Notice that the number of processes that ``pycbc_inference`` will use is
referenced by the ``condor|request_cpus`` argument in the
``[pegasus_profile-inference]`` section. This argurment is what tells
condor how many cores to assign to the job, and so sets the actual number
of resources ``pycbc_inference`` will get. Generally, you want this to
be the same as what is fed to ``pycbc_inference``'s ``nprocesses``
option.
The ``workflow_config.ini`` file can be used with any of the MCMC samplers when
analyzing a gravitational wave that involves the parameters mentioned in the
file. If you wanted to analyze other binary black holes, you could use this
same file, simply changing the ``events.ini`` file to point to the events
you want to analyze.
---------------------
Generate the workflow
---------------------
Assuming that you have downloaded all of the configuration files to the
same directory, you can generate the workflow by running the following script:
.. literalinclude:: ../../examples/workflow/inference/gw150914_gw170814-emcee_pt/create_workflow.sh
:language: bash
:download:`Download <../../examples/workflow/inference/gw150914_gw170814-emcee_pt/create_workflow.sh>`
Note that you need to set the ``HTML_DIR`` before running. This tells the
workflow where to save the results page when done. You can also change
``WORKFLOW_NAME`` if you like.
You should also change the ``SEED`` everytime you create a different workflow.
This sets the seed that is passed to ``pycbc_inference`` (you set it here
because it will be incremented for every ``pycbc_inference`` job that will be
run in the workflow).
After the workflow has finished it will have created a directory named
``${WORKFLOW_NAME}-output``. This contains the ``dax`` and all necessary files
to run the workflow.
-----------------------------
Plan and execute the workflow
-----------------------------
Change directory into the ``${WORKFLOW_NAME}-output`` directory::
cd ${WORKFLOW_NAME}-output
If you are on the ATLAS cluster (at AEI Hannover) or on an LDG cluster, you
need to define an accounting group tag (talk to your cluster admins if you do
not know what this is). Once you know what accounting-group tag to use, plan
and submit the workflow with::
# submit workflow
pycbc_submit_dax --dax ${WORKFLOW_NAME}.dax \
--no-grid \
--no-create-proxy \
--enable-shared-filesystem \
--accounting-group ${ACCOUNTING_GROUP}
Here, ``${ACCOUNTING_GROUP}`` is the appropriate tag for your workflow.
Once it is running, you can monitor the status of the workflow by running
``./status`` from within the ``${WORKFLOW_NAME}-output`` directory. If your
workflow fails for any reason, you can see what caused the failure by running
``./debug``. If you need to stop the workflow at any point, run ``./stop``.
To resume a workflow, run ``./start``. If the ``pycbc_inference`` jobs were
still running, and they had checkpointed, they will resume from their last
checkpoint upon restart.
------------
Results page
------------
When the workflow has completed successfully it will write out the results
page to the directory you specified in the ``create_workflow.sh`` script.
You can see what the result page will look like the example
`inference-gw150914_gw170814 `_.
===============================================
Example: GW150914 and GW170814 with ``dynesty``
===============================================
In this example, we repeat the above analysis, but using the `dynesty`
sampler. We can use the same
:download:`prior <../../examples/inference/priors/bbh-uniform_comoving_volume.ini>`,
:download:`model <../../examples/inference/models/marginalized_phase.ini>`,
and :download:`o1 <../../examples/inference/data/o1.ini>` and
:download:`o2 <../../examples/inference/data/o2.ini>` inference configuration
files as above. New files that we need are:
* The sampler configuration file for ``dynesty``:
.. literalinclude:: ../../examples/inference/samplers/dynesty.ini
:language: ini
:download:`Download <../../examples/inference/samplers/dynesty.ini>`
* An ``events`` file which uses ``dynesty``:
.. literalinclude:: ../../examples/workflow/inference/gw150914_gw170814-dynesty/events.ini
:language: ini
:download:`Download <../../examples/workflow/inference/gw150914_gw170814-dynesty/events.ini>`
Note that here, we are not running ``pycbc_inference`` multiple times. This is
because a single run of ``dynesty`` with the settings we are using (2000 live
points) produces a large number of (O(10 000)) samples.
We also need a slightly different
:download:`workflow configuration file <../../examples/workflow/inference/gw150914_gw170814-dynesty/workflow_config.ini>`. The only difference from the workflow configuration file from the one above
is that the diagnostic plot executable have been removed
(``plot_acceptance_rate`` and ``plot_samples``). This is because these
diagnostics do not work for ``dynesty``, a nested sampler. As above, **set the
nprocesses argument in the** ``[inference]`` **section to the number of cores that
works for your cluster.***
Note that we could have run both the ``emcee_pt`` analysis, above, and the
``dynesty`` analysis together in a single workflow. However, to do so, we would
need to remove any diagnostic plots that are unique to each sampler.
Once you have downloaded the necessary files, create the workflow and launch
it using the same ``create_workflow.sh`` script and ``pycbc_submit_dax``
commands as above, making sure to change the ``WORKFLOW_NAME`` and ``SEED``.
This will produce a results page that looks like the example
`inference-dynesty-gw150914_gw170814 `_.