pycbc_make_offline_grb_workflow: A GRB triggered CBC analysis workflow generator

Introduction

PyGRB is a tool used to generate a data analysis workflow for a targeted, coherent gravitational wave search triggered by short duration gamma-ray bursts.

When submitted, this workflow will run a pipeline to analyse data from multiple gravitational wave detectors coherently. It will then perform various signal-based veto cuts and data quality cuts to determine whether or not a compact binary coalescence signal is present in the given data coming from the same point in the sky and at the same time as an observed short duration gamma-ray burst.

The output will be a webpage containing plots and other data files that can be used to understand the results of the analysis. At the moment, the old results webage generator (pycbc_make_grb_summary_page) is gradually being replaced by pycbc_pygrb_pp_workflow, which needs to be run serparatly, for the time being.

Configuration File

The workflow is controlled by a configuration file, which is comprised of three types of section: workflow, the pegasus profile and executable options. The workflow sections control the general form of the workflow and how it is generated. The pegasus profile sections are equivalent to lines you would have in a condor_submit file (e.g. requirements, storage size, etc.). The executable option sections contain those options to be fed directly to the executables that will be used for the data analysis.

The following examples would all feature in an offline search on combined Hanford-Livingston data from the 8th Advanced LIGO engineering run (ER8). To find out more details about the possible options for any stage of the workflow, follow the links at Workflow: the inspiral analysis workflow generator (pycbc.workflow).

Workflow Sections

The main section usually contains overall properties for your workflow, many of which will be used elsewhere:

[workflow]
file-retention-level = all_files
h1-channel-name = H1:GDS-CALIB_STRAIN
l1-channel-name = L1:GDS-CALIB_STRAIN

We may define all the detectors (IFOs) that we are considering in our analysis:

[workflow-ifos]
; This is the list of ifos to analyse
h1 =
l1 =

The data frame types and other related options are given in:

[workflow-datafind]
datafind-method = AT_RUNTIME_SINGLE_CACHES
datafind-check-segment-gaps = raise_error
datafind-check-frames-exist = raise_error
datafind-check-segment-summary = no_test
datafind-h1-frame-type = H1_HOFT_C00
datafind-l1-frame-type = L1_HOFT_C00

Data segment information is given in the section:

[workflow-segments]
segments-method = AT_RUNTIME
segments-h1-science-name = H1:DMT-ANALYSIS_READY:1
segments-l1-science-name = L1:DMT-ANALYSIS_READY:1
segments-database-url = https://segments.ligo.org
segments-veto-categories = 3
segments-minimum-segment-length = 256
segments-veto-definer-url = https://code.pycbc.phy.syr.edu/detchar/veto-definitions/download/master/cbc/ER8/H1L1-HOFT_C00_ER8B_CBC.xml

The GRB search requires an additional set of segment-related options, which we give in the following section:

[workflow-exttrig_segments]
; options for the coherent search (development)
on-before = 5
on-after = 1
min-before = 60
min-after = 60
min-duration = 256
max-duration = 5264
quanta = 128
num-buffer-before = 8
num-buffer-after = 8

Executable Sections

We set the executables to be used for the analysis in the following way:

[executables]
inspiral                = ${which:lalapps_coh_PTF_inspiral}
splitbank               = ${which:pycbc_splitbank}
segment_query           = ${which:ligolw_segment_query_dqsegdb}
segments_from_cats      = ${which:ligolw_segments_from_cats_dqsegdb}
llwadd                  = ${which:ligolw_add}
ligolw_combine_segments = ${which:ligolw_combine_segments}
injections              = ${which:lalapps_inspinj}
jitter_skyloc           = ${which:ligolw_cbc_jitter_skyloc}
align_total_spin        = ${which:ligolw_cbc_align_total_spin}
split_inspinj           = ${which:pycbc_split_inspinj}
em_bright_filter        = ${which:pycbc_dark_vs_bright_injections}
trig_combiner           = ${which:pylal_cbc_cohptf_trig_combiner}
trig_cluster            = ${which:pylal_cbc_cohptf_trig_cluster}
injfinder               = ${which:pylal_cbc_cohptf_injfinder}
injcombiner             = ${Which:pylal_cbc_cohptf_injcombiner}
sbv_plotter             = ${which:pylal_cbc_cohptf_sbv_plotter}
efficiency              = ${which:pylal_cbc_cohptf_efficiency}
inj_efficiency          = ${which:pylal_cbc_cohptf_efficiency}
horizon_dist            = ${which:pylal_cbc_cohptf_inspiral_horizon}

Here we are getting the executable paths from our environment for flexibility, rather than supplying them as fixed paths.

The options to be given to every job run by an executable are then given within a secion with the relevant name, for example our inspiral jobs (in this case, lalapps_coh_PTF_inspiral) use the options in the following section:

[inspiral]
ligo-calibrated-data = real_8
approximant = SpinTaylorT4
order = threePointFivePN
.
.
.

If the workflow were to contain multiple subclasses of inspiral jobs – for example one for standard signal hunting and some for finding injected signals – options could be provided separately to these subclasses in tagged sections. If the injection jobs are tagged in the workflow by the string coherent_injections, then options specific to these jobs may be given in the section:

[inspiral-coherent_injections]
inj-search-window = 1
inj-mchirp-window = 0.05
analyze-inj-segs-only =

Sections which share a common set of options may be given together:

[inspiral&workflow-exttrig_segments]
pad-data = 8

Here the workflow-exttrig_segments section and the inspiral executable section are sharing a common option.

Pegasus Profile Sections

If, for example, we wished to ask condor to request nodes with 2000M of memory for the trig_combiner executable jobs, we may do this via:

[pegasus_profile-trig_combiner]
condor|request_memory=2000M

This can be generalised to any executable or tagged jobs.

How to run

Here we document the stages needed to run the triggered coherent GRB search.

Once PyCBC is installed, you should be able to run the following help command for the workflow generation script:

pycbc_make_offline_grb_workflow --help

This should produce a help message like the following

$ pycbc_make_offline_grb_workflow --help
No CuPy
No CuPy or GPU PhenomHM module.
No CuPy or GPU response available.
No CuPy or GPU interpolation available.
usage: pycbc_make_offline_grb_workflow [-h] [-v] [--version [VERSION]]
                                       [--config-files CONFIGFILE [CONFIGFILE ...]]
                                       [--config-overrides [SECTION:OPTION:VALUE ...]]
                                       [--config-delete [SECTION:OPTION ...]]
                                       --workflow-name WORKFLOW_NAME
                                       [--tags TAGS [TAGS ...]]
                                       [--output-dir OUTPUT_DIR]
                                       [--cache-file CACHE_FILE] [--plan-now]
                                       [--submit-now] [--dax-file DAX_FILE]

options:
  -h, --help            show this help message and exit

PyCBC common options:
  Common options for PyCBC executables.

  -v, --verbose         Add verbosity to logging. Adding the option multiple
                        times makes logging progressively more verbose, e.g.
                        --verbose or -v provides logging at the info level,
                        but -vv or --verbose --verbose provides debug logging.
  --version [VERSION]   Display PyCBC version information and exit. Can
                        optionally supply a modifier integer to control the
                        verbosity of the version information. 0 and 1 are the
                        same as --version; 2 provides more detailed PyCBC
                        library information; 3 provides information about
                        PyCBC, LAL and LALSimulation packages (if installed)

Configuration:
  Options needed for parsing config file(s).

  --config-files CONFIGFILE [CONFIGFILE ...]
                        List of config files to be used in analysis.
  --config-overrides [SECTION:OPTION:VALUE ...]
                        List of section,option,value combinations to add into
                        the configuration file. Normally the gps start and end
                        times might be provided this way, and user specific
                        locations (ie. output directories). This can also be
                        provided as SECTION:OPTION or SECTION:OPTION: both of
                        which indicate that the corresponding value is left
                        blank.
  --config-delete [SECTION:OPTION ...]
                        List of section,option combinations to delete from the
                        configuration file. This can also be provided as
                        SECTION which deletes the enture section from the
                        configuration file or SECTION:OPTION which deletes a
                        specific option from a given section.

Options for setting workflow files:
  --workflow-name WORKFLOW_NAME
                        Name of the workflow.
  --tags TAGS [TAGS ...]
                        Append the given tags to file names.
  --output-dir OUTPUT_DIR
                        Path to directory where the workflow will be written.
                        Default is to use {workflow-name}_output.
  --cache-file CACHE_FILE
                        Path to input file containing list of files to be
                        reused (the 'input_map' file)
  --plan-now            If given, workflow will immediately be planned on
                        completion of workflow generation but not submitted to
                        the condor pool. A start script will be created to
                        submit to condor.
  --submit-now          If given, workflow will immediately be submitted on
                        completion of workflow generation
  --dax-file DAX_FILE   Path to DAX file. Default is to write to the output
                        directory with name {workflow-name}.dax.

This outlines the command line arguments that may be passed to the executable. The majority of options passed to the workflow will come from configuration files, and these are known to the executable via the option --config-files.

Set up a run directory

Navigate to the directory you wish to run in:

RUN_DIR=/path/to/run/directory
mkdir -p $RUN_DIR
cd $RUN_DIR

Next gather together configuration files for your run.

Configuration files - Are you running from production configuration (.ini) files?

Yes, I want to run in a standard production configuration

The option --config-files takes a space separated list of files locations. These can be URLs to remote file locations. Production configuration files may be found here (LIGO.ORG protected).

Therefore, an example run on a GRB from the 8th Advance LIGO engineering run might use the following config files:

pycbc_make_offline_grb_workflow \
--config-files \
https://code.pycbc.phy.syr.edu/ligo-cbc/pycbc-config/download/master/ER8/pygrb/analysis_er8.ini \
https://code.pycbc.phy.syr.edu/ligo-cbc/pycbc-config/download/master/ER8/pygrb/injections_er8.ini \
https://code.pycbc.phy.syr.edu/ligo-cbc/pycbc-config/download/master/ER8/pygrb/postprocessing_er8.ini \
https://code.pycbc.phy.syr.edu/ligo-cbc/pycbc-config/download/master/ER8/pygrb/data_er8b.ini \
https://code.pycbc.phy.syr.edu/ligo-cbc/pycbc-config/download/master/ER8/pygrb/offline_er8.ini \

No, I have my own configuration files

The option --config-files takes a space separated list of files locations. For example, you could provide a pair of local files:

pycbc_make_offline_grb_workflow \
--config-files \
/path/to/config_file_1.ini \
/path/to/config_file_2.ini

Now go down to Generate the workflow.

Generate the workflow

When you are ready, you can generate the workflow. As this is a triggered gravitational wave search, a number of key pieces of information will change between one GRB and the next, such as the time of the GRB, or its position on the sky. This may perhaps be most easily done by setting a number of variables in your environment before launching the generation script.

First we need to set the trigger time, ie. the GPS Earth-crossing time of the GRB signal. You should also set the GRB name. For example:

GRB_TIME=1125614344
GRB_NAME=150906B

We should next set the sky coordinates of the GRB in RA and Dec, in this example:

RA=159.239
DEC=-25.603
SKY_ERROR=0

If you are using a pregenerated template bank and do not have a path to the bank set in your config file, set it here:

BANK_FILE=path/to/templatebank

You also need to specify the git directory of your lalsuite install:

export LAL_SRC=/path/to/folder/containing/lalsuite.git

If you want the results page to be moved to a location outside of your run, provide this too:

export HTML_DIR=/path/to/html/folder

If you are using locally editted or custom configuration files then you can create the workflow from within the run directory using:

pycbc_make_offline_grb_workflow \
--config-files \
https://code.pycbc.phy.syr.edu/ligo-cbc/pycbc-config/download/master/ER8/pygrb/analysis_er8.ini \
https://code.pycbc.phy.syr.edu/ligo-cbc/pycbc-config/download/master/ER8/pygrb/injections_er8.ini \
https://code.pycbc.phy.syr.edu/ligo-cbc/pycbc-config/download/master/ER8/pygrb/postprocessing_er8.ini \
https://code.pycbc.phy.syr.edu/ligo-cbc/pycbc-config/download/master/ER8/pygrb/data_er8a.ini \
https://code.pycbc.phy.syr.edu/ligo-cbc/pycbc-config/download/master/ER8/pygrb/offline_er8.ini \
--config-overrides \
workflow:ra:${RA} \
workflow:dec:${DEC} \
workflow:sky-error:${SKY_ERROR} \
workflow:trigger-name:${GRB_NAME} \
workflow:trigger-time:${GRB_TIME} \
workflow:start-time:$(( GRB_TIME - 4096 )) \
workflow:end-time:$(( GRB_TIME + 4096 )) \
workflow:html-dir:${HTML_DIR}

Planning and Submitting the Workflow

Change directory into the directory where the dax was generated:

cd GRB${GRB_NAME}

From the directory where the dax was created, run the submission script:

pycbc_submit_dax --dax pygrb_offline.dax --accounting-group <your.accounting.group.tag>

Note

If running on the ARCCA cluster, please provide a suitable directory via the option –local-dir, ie. /var/tmp/${USER}

Monitor and Debug the Workflow (Detailed Pegasus Documentation)

To monitor the above workflow, one can run:

pegasus-status -cl /path/to/analysis/run

To get debugging information in the case of failures.:

pegasus-analyzer /path/to/analysis/run

Pegasus Dashboard

The pegeasus dashboard is a visual and interactive way to get information about the progress, status, etc of your workflows.

The software can be obtained from a seprate pegasus package here <https://github.com/pegasus-isi/pegasus-service>.

Pegasus Plots

Pegasus has a tool called pegasus-plan to visualize workflows. To generate these charts and create an summary html page with this information, one would run:

export PPLOTSDIR=${HTMLDIR}/pegasus_plots
pegasus-plots --plotting-level all --output ${PPLOTSDIR} /path/to/analysis/run

The Invocation Breakdown Chart section gives a snapshot of the workflow. You can click on the slices of the pie chart and it will report the number of failures, average runtime, and max/min runtime for that type of jobs in the workflow. The radio button labeled runtime will organize the pie chart by total runtime rather than the total number of jobs for each job type.

The Workflow Execution Gantt Chart section breaks down the workflow how long it took to run each job. You can click on a job in the gantt chart and it will report the job name and runtime.

The Host Over Time Chart section displays a gantt chart where you can see what jobs in the workflow ran on a given machine.

Reuse of data from a previous workflow

One of the features of Pegasus is to reuse the data products of prior runs. This can be used to expand an analysis or recover a run with mistaken settings without duplicating work.

Generate the full workflow you want to do

First generate the full workflow for the run you would like to do as normal, following the instructions of this page from How to run, but stop before planning the workflow in Planning and Submitting the Workflow.

Select the files you want to reuse from the prior run

Locate the directory of the run that you would like to reuse. There is a file called GRB${GRB_NAME}/output.map, that contains a listing of all of the data products of the prior workflow.

Select the entries for files that you would like to skip generating again and place that into a new file. The example below selects all the inspiral and tmpltbank jobs and places their entries into a new listing called prior_data.map.:

# Lets get the tmpltbank entries
cat /path/to/old/run/GRB${GRB_NAME}/output.map | grep 'TMPLTBANK' > prior_data.map

# Add in the inspiral  files
cat /path/to/old/run/GRB${GRB_NAME}/output.map | grep 'INSPIRAL' >> prior_data.map

Note

You can include files in the prior data listing that wouldn’t be generated anyway by your new run. These are simply ignored.

Plan the workflow

From the directory where the dax was created, run the planning script:

pycbc_submit_dax --dax pygrb.dax --accounting-group <your.accounting.group.tag> --cache-file /path/to/prior_data.map

Follow the remaining Planning and Submitting the Workflow instructions to submit your reduced workflow.