pycbc_make_inference_workflow: A parameter estimation workflow generator

Introduction

The executable pycbc_make_inference_workflow is a workflow generator to setup a parameter estimation analysis. It can be setup to run on one or more events at once. For each event, the workflow:

  1. Runs pycbc_inference. If desired, you can run multiple independent instances of pycbc_inference on the same event.

  2. Extracts a posterior file using pycbc_inference_extract_samples. If multiple instances of pycbc_inference were run on the same event, the samples from all of the runs will be combined into a single posterior file. You can also have derived parameters written out to the posterior file.

  3. Makes various posterior plots and tables. The prior is also plotted. If you are analyzing gravitational-wave data, a plot of power spectral density (PSD) used for each event is also created.

  4. If you are working in a Python 3.x environment you can optionally have the workflow produce a skymap for each event (this requires ligo.skymap to be installed).

  5. Optionally creates sampler-dependent diagnostic plots.

  6. Generates a results html page that gathers all of the results.

The workflow generator requires a configuration file that tells it what plots to make, what parameters to produce posteriors for, which events to analyze, and any other settings to use for the various executables that are run.

For each event, one or more inference configuration files (the file(s) passed to pycbc_inference) must also be provided. These are separate from the workflow configuration file, as they describe how to analyze each event. You tell the workflow how many events to analyze and which inference configuration files to use for each event via [event-{label}] sections in the workflow configuration file. Here, {label} is a unique label for each event.

To illustrate how to setup and use a workflow, below we provide an example of how to setup the workflow to analyze two binary black hole events at once – GW150914 and GW170814.

Example: GW150914 and GW170814 with emcee_pt

In this example we setup a workflow to analyze GW150914 and GW170814 using emcee_pt. We will use a prior that is uniform in comoving volume and uniform in source masses. As we will be using the IMRPhenomPv2 waveform approximant, we will use the marginalized_phase Gaussian noise model.

This workflow will produce a results page that looks like the example inference-gw150914_gw170814.

The inference configuration files we will use can all be found in the pycbc examples directory. Below, we provide instructions on what files need to be downloaded, and how to setup and run the workflow.

Get the inference configuration files

We need the configuration files for pycbc_inference. These define the prior, model, sampler, and data to use for each event.

The prior:

;==============================================================================
;
;                         Standard BBH Prior
;
;==============================================================================
;
; This configuration file provides a standard prior for binary-black holes
; (BBH). It uses a uniform prior on *source* masses, along with a uniform
; prior in comoving volume. Waveform transforms are included to convert the
; source masses into detector-frame masses using a standard cosmology
; (Planck 2015). The minimum and maximum volumes used correspond to a
; luminosity distances of ~10Mpc and ~1.5Gpc, respectively. It can therefore
; be used with BBH in O1-O2. To use for future detectors, simply change the
; volume limits.
;
; The coa_phase is not varied, so this has to be used with a model that
; marginalizes the phase automatically (e.g. the mariginalized_phase or relbin
; models). If you are not using a model that marginalizes the phase, uncomment
; the coa_phase in the [variable_params], along with the [prior-coa_phase]
; section.
;
; The mass range used is 10-80, and so is fine for GW150914-like BBH. For
; more lower-mass BBH, the prior range should be decreased. Keep in mind
; that lowering the mass prior increases the duration of the longest waveform
; admitted by the prior (meaning that you may need to change your
; analysis-start-time in your data section if you do that).
;
; The starting frequency of the waveform approximant is set to 20Hz (the
; f_lower and f_ref settings in the [static_params]). This is OK to use
; for the O1-O3 LIGO and Virgo detectors. With this lower-frequency cutoff
; and the lower-bound of the mass prior of 10, the longest waveform that may
; be generated is ~6s. Suggested analysis-start and end-time settings are -6
; and 2 (with respect to the trigger-time), respectively.
;
; You may wish to lower the lower frequency cutoff for future detectors,
; in which the PSD has better lower-frequency performance.
; Keep in mind that decreasing the lower-frequency cutoff will make the
; waveforms have longer duration in the time domain, and so the analysis
; start time will need to be adjusted.
;
; No [data], [model], or [sampler] sections are provided here. This should be
; in used in tandem with additional configuration files that provide those
; sections.

[variable_params]
delta_tc = 
; Note that we call the masses srcmass[X]. This is because the waveform
; generator assumes that parameters called mass[X] are detector-frame masses.
; We therefore need to call the source masses something different; we choose
; "srcmass" here, but they could be called anything. In the waveform transforms
; sections below, we convert these to detector-frame masses.
srcmass1 = 
srcmass2 = 
spin1_a = 
spin1_azimuthal = 
spin1_polar = 
spin2_a = 
spin2_azimuthal = 
spin2_polar = 
comoving_volume = 
inclination = 
polarization = 
ra = 
dec = 
; Uncomment this if you are not using a model that marginalizes over phase.
;coa_phase =

[static_params]
approximant = IMRPhenomPv2
f_lower = 20
f_ref = 20
; The trigger time is used with delta_tc to get the coalescence time tc. We'll
; get the trigger time from the data section (provided in a separate file).
trigger_time = ${data|trigger-time}

;-----------------------------------------------------------------------------
;
;                           Intrinsic parameters
;
;-----------------------------------------------------------------------------

[prior-srcmass1]
name = uniform
min-srcmass1 = 10
max-srcmass1 = 80

[prior-srcmass2]
name = uniform
min-srcmass2 = 10
max-srcmass2 = 80

[prior-spin1_a]
name = uniform
min-spin1_a = 0.0
max-spin1_a = 0.99

[prior-spin1_polar+spin1_azimuthal]
name = uniform_solidangle
polar-angle = spin1_polar
azimuthal-angle = spin1_azimuthal

[prior-spin2_a]
name = uniform
min-spin2_a = 0.0
max-spin2_a = 0.99

[prior-spin2_polar+spin2_azimuthal]
name = uniform_solidangle
polar-angle = spin2_polar
azimuthal-angle = spin2_azimuthal

; The waveform generator expects spins to be in cartesian coordinates, with
; names spin(1|2)(x|y|z). We therefore need to provide a waveform transform
; that converts the spherical coordinates that we have defined the spin prior
; in to cartesian coordinates.
[waveform_transforms-spin1x+spin1y+spin1z]
name = spherical_to_cartesian
x = spin1x
y = spin1y
z = spin1z
radial = spin1_a
polar = spin1_polar
azimuthal = spin1_azimuthal

[waveform_transforms-spin2x+spin2y+spin2z]
name = spherical_to_cartesian
x = spin2x
y = spin2y
z = spin2z
radial = spin2_a
polar = spin2_polar
azimuthal = spin2_azimuthal

;-----------------------------------------------------------------------------
;
;                           Extrinsic parameters
;
;-----------------------------------------------------------------------------

[prior-delta_tc]
name = uniform
; We'll use +/-0.1s around the estimated coalescence (trigger) time.
min-delta_tc = -0.1
max-delta_tc = 0.1

[waveform_transforms-tc]
; The waveform generator needs tc, which we calculate here.
name = custom
inputs = trigger_time, delta_tc
tc = trigger_time + delta_tc

[prior-inclination]
name = sin_angle

; Uncomment this section if you are not using a model that marginalizes over
; the phase.
;[prior-coa_phase]
;name = uniform_angle

[prior-ra+dec]
name = uniform_sky

[prior-polarization]
name = uniform_angle

[prior-comoving_volume]
name = uniform
; These limits correspond to luminosity distances of ~[10, 1500) Mpc. Change
; if you are analyzing detections which are more than ~1Gpc away.
min-comoving_volume = 5e3
max-comoving_volume = 9e9

; The following [waveform_transforms] sections convert the comoving volume
; to luminosity distance and the source masses to detector frame masses.
; The latter is done by calculating redshift from the comoving volume first.
; The order that transforms need to be applied is figured out automatically by
; the code, so it doesn't matter what order we put them here, as long as we
; provide transforms for all intermediate steps.
[waveform_transforms-redshift]
name = custom
inputs = comoving_volume
redshift = redshift_from_comoving_volume(comoving_volume)

[waveform_transforms-distance]
name = custom
inputs = comoving_volume
distance = distance_from_comoving_volume(comoving_volume)

[waveform_transforms-mass1]
name = custom
inputs = srcmass1, redshift
mass1 = srcmass1 * (1 + redshift)

[waveform_transforms-mass2]
name = custom
inputs = srcmass2, redshift
mass2 = srcmass2 * (1 + redshift)

Download

The model:

;==============================================================================
;
;             Gaussian noise model with marginalized phase
;
;==============================================================================
;
; This provides settings for the marginalized Gaussian noise model. The
; low-frequency-cutoff of 20Hz is adequate for the O1-O3 LIGO and Virgo
; detectors. Change as appropriate for future detectors.
;
; The check-for-valid-times and shift-psd-times-to-valid options mean that
; the model will check to see if there are any data quality flags on during
; the requested analysis times (determined by the [data] section). If there
; are any flags on during the analysis times for a detector, that detector
; will automatically be removed from the analysis (an error will be raised if
; all detectors are removed). If you do not want this to happen, uncomment
; the err-on-missing-detectors option. In that case, an error will be raised
; if any detectors have a data quality flag on. The shift-psd-times-to-valid
; argument will cause the times used for estimating the PSD (determined by
; the psd-start|end-time arguments in the [data] section) to be shifted left
; or right to avoid any data quality flags. A shift up to +/- dT/2 will be
; tried, where dT is the difference between psd-end-time and psd-start-time.
; If no valid data can be found even with the maximum shift, the detector
; will be removed the analysis. To check for valid-times, the dq-* options
; are used in the strain model. See
; http://pycbc.org/pycbc/latest/html/inference.html#setting-data for details.
;
; The ignore-failed-waveforms option tells the model to treat points in
; parameter space that cause the waveform generator to fail as having 0
; likelihood. This may be necessary for newer precessing models, in which
; the entire parameter space has not been fully tested. Note, however, that
; in that case you will not be able to tell if some parameters have been ruled
; out because the data disfavors them, or because the model failed. For this
; reason it is best to let the code raise an error (i.e., leave the option
; commented out), and only ignore these errors once you are confident you know
; the reason.
;
[model]
name = marginalized_phase
low-frequency-cutoff = 20
check-for-valid-times = 
shift-psd-times-to-valid =
;err-on-missing-detectors =
;ignore-failed-waveforms =

Download

The sampler:

;==============================================================================
;
;                Emcee PT settings for CBC, comoving volume
;
;==============================================================================
;
; The following provides standard settings for emcee_pt when analying a
; combact binary merger. This assumes that the prior is specified in terms
; of the source masses (srcmass1, srcmass2) and comoving volume
; (comoving_volume). To speed up convergence, the source masses are sampled in
; chirp mass and mass ratio, and the comoving volume is sampled in the log.
;
; We set the number of effective samples to 1500 because we've found that
; emcee_pt struggles to acquire more than ~8 independent samples per walker.
;
[sampler]
name = emcee_pt
nwalkers = 200
ntemps = 20
effective-nsamples = 1500
checkpoint-interval = 2000
max-samples-per-chain = 1000

[sampler-burn_in]
burn-in-test = nacl & max_posterior

[sampling_params]
srcmass1, srcmass2 = mchirp, q
comoving_volume = logv

[sampling_transforms-mchirp+q]
name = mass1_mass2_to_mchirp_q
mass1_param = srcmass1
mass2_param = srcmass2

[sampling_transforms-logv]
name = log
inputvar = comoving_volume
outputvar = logv

Download

The data: We also need configuration files for the data. Since GW150914 occured during O1 while GW170814 occurred during O2, we need both the standard O1 and O2 files:

;==============================================================================
;
;                     Settings for analyzing O1 data
;
;==============================================================================
;
; This provides standard settings for analyzing H1, and L1 data in O1.
; It uses "OVERRIDE" for parameters that event-specific. Replace OVERRIDE
; either by editing this file, or using the config-override option in
; pycbc_inference.
[data]
instruments = H1 L1
trigger-time = OVERRIDE
analysis-start-time = OVERRIDE
analysis-end-time = OVERRIDE
psd-estimation = median-mean
psd-start-time = -256
psd-end-time = 256
psd-inverse-length = 8
psd-segment-length = 8
psd-segment-stride = 4
; If you are running on the Atlas cluster, an LDG cluster, or any computer
; with a ligo-data-server, you can use the frame-type argument to automatically
; locate the location of the frame files containing the data. If you are not
; running on one of those computers, download the necessary data from GWOSC
; (gwosc.org), remove the frame-type argument, and uncomment
; frame-files, pointing the latter to the files you downloaded.
;frame-files = H1:/PATH/TO/DOWNLOADED/H1FRAME.gwf L1:/PATH/TO/DOWNLOADED/L1FRAME.gwf
frame-type = H1:H1_LOSC_16_V1 L1:L1_LOSC_16_V1
channel-name = H1:GWOSC-16KHZ_R1_STRAIN L1:GWOSC-16KHZ_R1_STRAIN
; A sample rate of 2048 is sufficient for BBH. If you are analyzing a BNS or
; NSBH, change to 4096.
sample-rate = 2048
strain-high-pass = 15
pad-data = 8

Download

;==============================================================================
;
;                     Settings for analyzing O2 data
;
;==============================================================================
;
; This provides standard settings for analyzing H1, L1, and V1 data in O2.
; It uses "OVERRIDE" for parameters that event-specific. Replace OVERRIDE
; either by editing this file, or using the config-override option in
; pycbc_inference.
;
[data]
instruments = H1 L1 V1
trigger-time = OVERRIDE
analysis-start-time = OVERRIDE
analysis-end-time = OVERRIDE
psd-estimation = median-mean
psd-start-time = -256
psd-end-time = 256
psd-inverse-length = 8
psd-segment-length = 8
psd-segment-stride = 4
; If you are running on the Atlas cluster, an LDG cluster, or any computer
; with a ligo-data-server, you can use the frame-type argument to automatically
; locate the location of the frame files containing the data. If you are not
; running on one of those computers, download the necessary data from GWOSC
; (gwosc.org), remove the frame-type argument, and uncomment
; frame-files, pointing the latter to the files you downloaded.
frame-type = H1:H1_GWOSC_O2_16KHZ_R1 L1:L1_GWOSC_O2_16KHZ_R1 V1:V1_GWOSC_O2_16KHZ_R1
;frame-files = H1:/PATH/TO/DOWNLOADED/H1FRAME.gwf L1:/PATH/TO/DOWNLOADED/L1FRAME.gwf V1:/PATH/TO/DOWNLOADED/V1FRAME.gwf
channel-name = H1:GWOSC-16KHZ_R1_STRAIN L1:GWOSC-16KHZ_R1_STRAIN V1:GWOSC-16KHZ_R1_STRAIN
; A sample rate of 2048 is sufficient for BBH. If you are analyzing a BNS or
; NSBH, change to 4096.
sample-rate = 2048
strain-high-pass = 15
pad-data = 8

Download

Setup the workflow configuration file

As discussed above, the workflow configuration file specifes what events to analyze, what programs to run, and what settings to use for those programs. Since the same general workflow settings can be used for different classes of events, here we have split the workflow configuration file into two separate files, events.ini and workflow_config.ini. The former specifies what events we are analyzing in this run, while the latter specifies all of the other settings. As we will see below, we can simply provide these two files to pycbc_make_inference_workflow’s --config-file argument; it will automatically combine them into a single file.

The events:

[event-gw150914]
label = GW150914+09:50:45UTC
config-files = bbh-uniform_comoving_volume.ini
               marginalized_phase.ini
               emcee_pt-srcmasses_comoving_volume.ini
               o1.ini
config-overrides = data:trigger-time:1126259462.413
                   data:analysis-start-time:-6
                   data:analysis-end-time:2
; We'll run inference twice to double the number of independent samples
nruns = 2

[event-gw170814]
label = GW170814+10:30:43UTC 
config-files = bbh-uniform_comoving_volume.ini
               marginalized_phase.ini
               emcee_pt-srcmasses_comoving_volume.ini
               o2.ini
config-overrides = data:trigger-time:1186741861.533
                   data:analysis-start-time:-6
                   data:analysis-end-time:2
; We'll run inference twice to double the number of independent samples
nruns = 2

Download

The rest of the configuration file:

[workflow]
; basic information used by the workflow generator
file-retention-level = all_triggers
; The start/end times here are just used for file naming. They can be set
; to anything -- they aren't used for anything, and have no effect on the
; analysis. The actual analysis times used are set by the [data] section in
; the configuration files given to pycbc_inference (specified in the events
; config file).
start-time = 1126259200
end-time = 1126259600 

[workflow-ifos]
; The ifos listed here are just used for file naming, it doesn't matter if
; they are not consistent with the actual detectors analyzed.
h1 =
l1 =
v1 =

[extract_posterior]
; Here, we'll ensure that the output parameters are such that mass1 >= mass2
; (and associated spins), change comoving volume into redshift and distance,
; add mchirp, q, chi_eff, and chi_p to the posterior files.
parameters = 'primary_mass(srcmass1, srcmass2):srcmass1'
             'secondary_mass(srcmass1, srcmass2):srcmass2'
             'primary_spin(srcmass1, srcmass2, spin1_a, spin2_a):spin1_a'
             'primary_spin(srcmass1, srcmass2, spin1_azimuthal, spin2_azimuthal):spin1_azimuthal'
             'primary_spin(srcmass1, srcmass2, spin1_polar, spin2_polar):spin1_polar'
             'secondary_spin(srcmass1, srcmass2, spin1_a, spin2_a):spin2_a'
             'secondary_spin(srcmass1, srcmass2, spin1_azimuthal, spin2_azimuthal):spin2_azimuthal'
             'secondary_spin(srcmass1, srcmass2, spin1_polar, spin2_polar):spin2_polar'
             'mchirp_from_mass1_mass2(srcmass1, srcmass2):srcmchirp'
             'chi_eff_from_spherical(srcmass1, srcmass2, spin1_a, spin1_polar, spin2_a, spin2_polar):chi_eff'
             'chi_p_from_spherical(srcmass1, srcmass2, spin1_a, spin1_azimuthal, spin1_polar, spin2_a, spin2_azimuthal, spin2_polar):chi_p'
             'redshift_from_comoving_volume(comoving_volume):redshift'
             'distance_from_comoving_volume(comoving_volume):distance'
             '*'
force =

[workflow-summary_table]
; Parameters that will be printed in the summary table.
; These must be from the set specified in extract_posterior.
table-params = srcmass1 srcmass2
               srcmchirp 'q_from_mass1_mass2(srcmass1, srcmass2):q'
               chi_eff chi_p
               ra dec delta_tc
               distance redshift
               'snr_from_loglr(loglikelihood-lognl):SNR'
; The additional metadata will be printed below the table. We can print
; anything that is in the posterior files' attrs.
print-metadata = 'trigger_time:$t_0$' 'analyzed_detectors:Detectors'

[workflow-summary_plots]
; Parameter posteriors that will plotted on the summary page.
; These must be from the set specified in extract_posterior.
; Each plot-group corresponds to a single plot that will be plot on the
; summary page. Generally, these should be limited to 1 or 2 dimensions
; (although this is not enforced); larger corner plots can be put in the
; Posteriors page. The plots for those are set by the [workflow-plot_params]
; section (see below).
; The settings for the posterior plots created here are read from the
; [plot_posterior_summary] section.
plot-group-mass1_mass2 = srcmass1 srcmass2
plot-group-inc_distance = inclination distance
plot-group-chip_chieff = chi_p chi_eff
; Notice that we are not including ra and dec here. The sky map is
; created by [plot_skymap].

[workflow-plot_params]
; Parameter posteriors that will plotted on the "Posteriors" page.
; These must be from the set specified in extract_posterior.
; Each plot-group corresponds to a single plot that will be plot on the
; page. Since the events are split into their own sub-pages, it's ok to make
; large corner plots here (although too large and it will be hard to make
; out what each parameter is doing).
; The settings for the posterior plots created here are read from the
; [plot_posterior] section.
; Since we plotted source-frame masses on the summary page, here we'll
; plot detector-frame masses.
plot-group-masses = 'srcmass1*(1+redshift):mass1'
                    'srcmass2*(1+redshift):mass2'
                    'srcmchirp*(1+redshift):mchirp'
                    'q_from_mass1_mass2(srcmass1, srcmass2):q'
plot-group-spins = spin1_a spin2_a
                   spin1_azimuthal spin2_azimuthal
                   spin1_polar spin2_polar
                   chi_eff chi_p
plot-group-extrinsic = ra dec delta_tc polarization inclination distance redshift

[executables]
; paths to executables to use in workflow
inference = ${which:run_pycbc_inference}
extract_posterior = ${which:pycbc_inference_extract_samples}
plot_posterior = ${which:pycbc_inference_plot_posterior}
plot_posterior_summary = ${which:pycbc_inference_plot_posterior}
plot_prior = ${which:pycbc_inference_plot_prior}
table_summary = ${which:pycbc_inference_table_summary}
create_fits_file = ${which:pycbc_inference_create_fits}
plot_skymap = ${which:pycbc_inference_plot_skymap}
plot_spectrum = ${which:pycbc_plot_psd_file}
results_page = ${which:pycbc_make_html_page}
; diagnostic plots
plot_acceptance_rate = ${which:pycbc_inference_plot_acceptance_rate}
plot_samples = ${which:pycbc_inference_plot_samples}

[pegasus_profile]
; +MaxRunTimeHours is needed for running on the ATLAS cluster; comment out
; if your cluster does not need this.
condor|+MaxRunTimeHours = 1

[pegasus_profile-inference]
condor|request_memory = 40G
; +MaxRunTimeHours is needed for running on the ATLAS cluster; comment out
; if your cluster does not need this.
condor|+MaxRunTimeHours = 10
condor|request_cpus = ${inference|nprocesses}

[pegasus_profile-plot_prior]
condor|request_memory = 4G

[pegasus_profile-plot_skymap]
condor|request_memory = 4G

[pegasus_profile-plot_posterior]
condor|request_memory = 4G

[pegasus_profile-plot_posterior_summary]
condor|request_memory = 4G

[pegasus_profile-plot_samples]
condor|request_memory = 4G

[inference]
; Command line options for pycbc_inference.
verbose =
; Set the nprocesses to the number of cores you want each job to use. The
; value you use is cluster dependent.
nprocesses = 32

[plot_posterior_summary]
; These are the command line options that will be passed to
; pycbc_inference_plot_posterior for creating the posterior plots on the
; summary page. These settings will cause density plots to be made.
plot-contours =
plot-marginal =
plot-density =
density-cmap = Blues
contour-color = black

[plot_posterior]
; These are the command line options that will be passed to
; pycbc_inference_plot_posterior for creating the posterior plots on the
; posteriors page. These settings will cause scatter plots to be made showing
; each point in the posterior, colored by the matched-filter SNR.
plot-contours =
plot-marginal =
plot-scatter =
z-arg = snr

[create_fits_file]
; These are the settings for creating a fits file, which is used to produce
; the skymaps. This program needs ligo.skymap to be installed.
; The maxpts option limits the number of points in the posterior that are used
; to create the skymap. This is mostly for speeding up run time. Comment out
; to use all points.
maxpts = 1000
; Since the posterior file stores delta_tc, we need to tell the fits
; file how to calculate tc
tc = 'trigger_time+delta_tc'

[plot_skymap]
; These are settings for creating the skymap. This program requires
; ligo.skymap to be installed. Here, we're just setting the colormap to be
; the same as the posterior density plots, above.
colormap = ${plot_posterior_summary|density-cmap}

[plot_prior]
; This sets command-line options to use for the plot prior function. These
; plots are on the "priors" page. The default (giving no options) is to
; plot all of the variable params.

[table_summary]
; This sets command-line options for the table on the summary page. You
; should not need to set anything here.

[plot_spectrum]
; This sets command-line options for the ASD plots on the detector sensitivity
; page. The dyn-range-factor needs to be set to 1.
dyn-range-factor = 1

[plot_acceptance_rate]
; This sets command-line options for the acceptance rate diagnostic plots.
; This should only be used for MCMC samplers. You do not need to set anything
; here for this plot.

[plot_samples]
; This sets command-line options for the plot of samples chains.
; This should only be used for MCMC samplers. Here, we are telling it to plot
; all chains, and to show every single iteration.
chains = all
thin-start = 0
thin-interval = 1

[results_page]
; This sets settings for creating the results page. You may want to change
; the analysis title, to make it more descriptive.
analysis-title = "Inference results"

Download

Notes:

  • Since the [executables] section contains entries for create_fits_file and plot_skymap, the workflow will try to create sky maps. This requires a Python 3.x environment and ligo.skymap to be installed. If you have not installed ligo.skymap yet, do so by running:

    pip install ligo.skymap
    
  • If you do not want to create sky maps, or are running a Python 2.7 environment, you can turn this off by simply commenting out or removing create_fits_file and plot_skymap from the [executables] section.

  • The number of cores that will be used by pycbc_inference is set by the nprocesses argument in the [inference] section. You should set this to the number of cores you expect to be able to get on your cluster. In the configuration presented here, we are limited to shared memory cores. (It is possible to run using MPI in order to parallelize over a larger number of cores, but that requires special condor settings that must be implemented by your cluster admins. That is outside the scope of these instructions.)

  • Notice that the number of processes that pycbc_inference will use is referenced by the condor|request_cpus argument in the [pegasus_profile-inference] section. This argurment is what tells condor how many cores to assign to the job, and so sets the actual number of resources pycbc_inference will get. Generally, you want this to be the same as what is fed to pycbc_inference’s nprocesses option.

The workflow_config.ini file can be used with any of the MCMC samplers when analyzing a gravitational wave that involves the parameters mentioned in the file. If you wanted to analyze other binary black holes, you could use this same file, simply changing the events.ini file to point to the events you want to analyze.

Generate the workflow

Assuming that you have downloaded all of the configuration files to the same directory, you can generate the workflow by running the following script:

set -e

WORKFLOW_NAME=inference-gw150914_gw170814
# Set the HTML_DIR to point to your public html page. This is where the results
# page will be written.
HTML_DIR=''
if [ "${HTML_DIR}" == '' ]; then
    echo "Please set an HTML_DIR"
    exit 1
fi
SEED=978241
pycbc_make_inference_workflow \
    --seed ${SEED} \
    --config-files workflow_config.ini events.ini \
    --workflow-name ${WORKFLOW_NAME} \
    --config-overrides results_page:output-path:${HTML_DIR}/${WORKFLOW_NAME}

Download

Note that you need to set the HTML_DIR before running. This tells the workflow where to save the results page when done. You can also change WORKFLOW_NAME if you like.

You should also change the SEED everytime you create a different workflow. This sets the seed that is passed to pycbc_inference (you set it here because it will be incremented for every pycbc_inference job that will be run in the workflow).

After the workflow has finished it will have created a directory named ${WORKFLOW_NAME}-output. This contains the dax and all necessary files to run the workflow.

Plan and execute the workflow

Change directory into the ${WORKFLOW_NAME}-output directory:

cd ${WORKFLOW_NAME}-output

If you are on the ATLAS cluster (at AEI Hannover) or on an LDG cluster, you need to define an accounting group tag (talk to your cluster admins if you do not know what this is). Once you know what accounting-group tag to use, plan and submit the workflow with:

# submit workflow
pycbc_submit_dax --dax ${WORKFLOW_NAME}.dax \
    --no-grid \
    --no-create-proxy \
    --enable-shared-filesystem \
    --accounting-group ${ACCOUNTING_GROUP}

Here, ${ACCOUNTING_GROUP} is the appropriate tag for your workflow.

Once it is running, you can monitor the status of the workflow by running ./status from within the ${WORKFLOW_NAME}-output directory. If your workflow fails for any reason, you can see what caused the failure by running ./debug. If you need to stop the workflow at any point, run ./stop. To resume a workflow, run ./start. If the pycbc_inference jobs were still running, and they had checkpointed, they will resume from their last checkpoint upon restart.

Results page

When the workflow has completed successfully it will write out the results page to the directory you specified in the create_workflow.sh script. You can see what the result page will look like the example inference-gw150914_gw170814.

Example: GW150914 and GW170814 with dynesty

In this example, we repeat the above analysis, but using the dynesty sampler. We can use the same prior, model, and o1 and o2 inference configuration files as above. New files that we need are:

  • The sampler configuration file for dynesty:

;==============================================================================
;
;                       Dynesty settings
;
;==============================================================================
;
; The following provides standard settings for the dynesty sampler.
;
[sampler]
name = dynesty
dlogz = 0.1
nlive = 2000

Download

  • An events file which uses dynesty:

[event-gw150914]
label = GW150914+09:50:45UTC
config-files = bbh-uniform_comoving_volume.ini
               marginalized_phase.ini
               dynesty.ini
               o1.ini
config-overrides = data:trigger-time:1126259462.413
                   data:analysis-start-time:-8
                   data:analysis-end-time:2
; We can run multiple instances of inference to accumulate more samples by
; setting nruns. This is useful for emcee_pt. However, dynesty generally
; produces enough samples in a single run, so we'll leave this commented out.
; (The default is to do 1 run.)
;nruns = 2

[event-gw170814]
label = GW170814+10:30:43UTC 
config-files = bbh-uniform_comoving_volume.ini
               marginalized_phase.ini
               dynesty.ini
               o2.ini
config-overrides = data:trigger-time:1186741861.533
                   data:analysis-start-time:-8
                   data:analysis-end-time:2

Download

Note that here, we are not running pycbc_inference multiple times. This is because a single run of dynesty with the settings we are using (2000 live points) produces a large number of (O(10 000)) samples.

We also need a slightly different workflow configuration file. The only difference from the workflow configuration file from the one above is that the diagnostic plot executable have been removed (plot_acceptance_rate and plot_samples). This is because these diagnostics do not work for dynesty, a nested sampler. As above, set the nprocesses argument in the [inference] section to the number of cores that works for your cluster.*

Note that we could have run both the emcee_pt analysis, above, and the dynesty analysis together in a single workflow. However, to do so, we would need to remove any diagnostic plots that are unique to each sampler.

Once you have downloaded the necessary files, create the workflow and launch it using the same create_workflow.sh script and pycbc_submit_dax commands as above, making sure to change the WORKFLOW_NAME and SEED.

This will produce a results page that looks like the example inference-dynesty-gw150914_gw170814.