pycbc_make_inference_workflow
: A parameter estimation workflow generator
Introduction
The executable pycbc_make_inference_workflow
is a workflow generator to
setup a parameter estimation analysis. It can be setup to run on one or more
events at once. For each event, the workflow:
Runs
pycbc_inference
. If desired, you can run multiple independent instances ofpycbc_inference
on the same event.Extracts a posterior file using
pycbc_inference_extract_samples
. If multiple instances ofpycbc_inference
were run on the same event, the samples from all of the runs will be combined into a single posterior file. You can also have derived parameters written out to the posterior file.Makes various posterior plots and tables. The prior is also plotted. If you are analyzing gravitational-wave data, a plot of power spectral density (PSD) used for each event is also created.
If you are working in a Python 3.x environment you can optionally have the workflow produce a skymap for each event (this requires
ligo.skymap
to be installed).Optionally creates sampler-dependent diagnostic plots.
Generates a results html page that gathers all of the results.
The workflow generator requires a configuration file that tells it what plots to make, what parameters to produce posteriors for, which events to analyze, and any other settings to use for the various executables that are run.
For each event, one or more inference configuration files (the file(s) passed
to pycbc_inference
) must also be provided. These are separate from the
workflow configuration file, as they describe how to analyze each event. You
tell the workflow how many events to analyze and which inference configuration
files to use for each event via [event-{label}]
sections in the workflow
configuration file. Here, {label}
is a unique label for each event.
To illustrate how to setup and use a workflow, below we provide an example of how to setup the workflow to analyze two binary black hole events at once – GW150914 and GW170814.
Example: GW150914 and GW170814 with emcee_pt
In this example we setup a workflow to analyze GW150914 and GW170814 using
emcee_pt
. We will use a prior that is uniform in comoving volume and
uniform in source masses. As we will be using the IMRPhenomPv2
waveform
approximant, we will use the marginalized_phase
Gaussian noise model.
This workflow will produce a results page that looks like the example inference-gw150914_gw170814.
The inference configuration files we will use can all be found in the pycbc
examples
directory. Below, we provide instructions on what files need
to be downloaded, and how to setup and run the workflow.
Get the inference configuration files
We need the configuration files for pycbc_inference
. These define the
prior, model, sampler, and data to use for each event.
The prior:
;==============================================================================
;
; Standard BBH Prior
;
;==============================================================================
;
; This configuration file provides a standard prior for binary-black holes
; (BBH). It uses a uniform prior on *source* masses, along with a uniform
; prior in comoving volume. Waveform transforms are included to convert the
; source masses into detector-frame masses using a standard cosmology
; (Planck 2015). The minimum and maximum volumes used correspond to a
; luminosity distances of ~10Mpc and ~1.5Gpc, respectively. It can therefore
; be used with BBH in O1-O2. To use for future detectors, simply change the
; volume limits.
;
; The coa_phase is not varied, so this has to be used with a model that
; marginalizes the phase automatically (e.g. the mariginalized_phase or relbin
; models). If you are not using a model that marginalizes the phase, uncomment
; the coa_phase in the [variable_params], along with the [prior-coa_phase]
; section.
;
; The mass range used is 10-80, and so is fine for GW150914-like BBH. For
; more lower-mass BBH, the prior range should be decreased. Keep in mind
; that lowering the mass prior increases the duration of the longest waveform
; admitted by the prior (meaning that you may need to change your
; analysis-start-time in your data section if you do that).
;
; The starting frequency of the waveform approximant is set to 20Hz (the
; f_lower and f_ref settings in the [static_params]). This is OK to use
; for the O1-O3 LIGO and Virgo detectors. With this lower-frequency cutoff
; and the lower-bound of the mass prior of 10, the longest waveform that may
; be generated is ~6s. Suggested analysis-start and end-time settings are -6
; and 2 (with respect to the trigger-time), respectively.
;
; You may wish to lower the lower frequency cutoff for future detectors,
; in which the PSD has better lower-frequency performance.
; Keep in mind that decreasing the lower-frequency cutoff will make the
; waveforms have longer duration in the time domain, and so the analysis
; start time will need to be adjusted.
;
; No [data], [model], or [sampler] sections are provided here. This should be
; in used in tandem with additional configuration files that provide those
; sections.
[variable_params]
delta_tc =
; Note that we call the masses srcmass[X]. This is because the waveform
; generator assumes that parameters called mass[X] are detector-frame masses.
; We therefore need to call the source masses something different; we choose
; "srcmass" here, but they could be called anything. In the waveform transforms
; sections below, we convert these to detector-frame masses.
srcmass1 =
srcmass2 =
spin1_a =
spin1_azimuthal =
spin1_polar =
spin2_a =
spin2_azimuthal =
spin2_polar =
comoving_volume =
inclination =
polarization =
ra =
dec =
; Uncomment this if you are not using a model that marginalizes over phase.
;coa_phase =
[static_params]
approximant = IMRPhenomPv2
f_lower = 20
f_ref = 20
; The trigger time is used with delta_tc to get the coalescence time tc. We'll
; get the trigger time from the data section (provided in a separate file).
trigger_time = ${data|trigger-time}
;-----------------------------------------------------------------------------
;
; Intrinsic parameters
;
;-----------------------------------------------------------------------------
[prior-srcmass1]
name = uniform
min-srcmass1 = 10
max-srcmass1 = 80
[prior-srcmass2]
name = uniform
min-srcmass2 = 10
max-srcmass2 = 80
[prior-spin1_a]
name = uniform
min-spin1_a = 0.0
max-spin1_a = 0.99
[prior-spin1_polar+spin1_azimuthal]
name = uniform_solidangle
polar-angle = spin1_polar
azimuthal-angle = spin1_azimuthal
[prior-spin2_a]
name = uniform
min-spin2_a = 0.0
max-spin2_a = 0.99
[prior-spin2_polar+spin2_azimuthal]
name = uniform_solidangle
polar-angle = spin2_polar
azimuthal-angle = spin2_azimuthal
; The waveform generator expects spins to be in cartesian coordinates, with
; names spin(1|2)(x|y|z). We therefore need to provide a waveform transform
; that converts the spherical coordinates that we have defined the spin prior
; in to cartesian coordinates.
[waveform_transforms-spin1x+spin1y+spin1z]
name = spherical_to_cartesian
x = spin1x
y = spin1y
z = spin1z
radial = spin1_a
polar = spin1_polar
azimuthal = spin1_azimuthal
[waveform_transforms-spin2x+spin2y+spin2z]
name = spherical_to_cartesian
x = spin2x
y = spin2y
z = spin2z
radial = spin2_a
polar = spin2_polar
azimuthal = spin2_azimuthal
;-----------------------------------------------------------------------------
;
; Extrinsic parameters
;
;-----------------------------------------------------------------------------
[prior-delta_tc]
name = uniform
; We'll use +/-0.1s around the estimated coalescence (trigger) time.
min-delta_tc = -0.1
max-delta_tc = 0.1
[waveform_transforms-tc]
; The waveform generator needs tc, which we calculate here.
name = custom
inputs = trigger_time, delta_tc
tc = trigger_time + delta_tc
[prior-inclination]
name = sin_angle
; Uncomment this section if you are not using a model that marginalizes over
; the phase.
;[prior-coa_phase]
;name = uniform_angle
[prior-ra+dec]
name = uniform_sky
[prior-polarization]
name = uniform_angle
[prior-comoving_volume]
name = uniform
; These limits correspond to luminosity distances of ~[10, 1500) Mpc. Change
; if you are analyzing detections which are more than ~1Gpc away.
min-comoving_volume = 5e3
max-comoving_volume = 9e9
; The following [waveform_transforms] sections convert the comoving volume
; to luminosity distance and the source masses to detector frame masses.
; The latter is done by calculating redshift from the comoving volume first.
; The order that transforms need to be applied is figured out automatically by
; the code, so it doesn't matter what order we put them here, as long as we
; provide transforms for all intermediate steps.
[waveform_transforms-redshift]
name = custom
inputs = comoving_volume
redshift = redshift_from_comoving_volume(comoving_volume)
[waveform_transforms-distance]
name = custom
inputs = comoving_volume
distance = distance_from_comoving_volume(comoving_volume)
[waveform_transforms-mass1]
name = custom
inputs = srcmass1, redshift
mass1 = srcmass1 * (1 + redshift)
[waveform_transforms-mass2]
name = custom
inputs = srcmass2, redshift
mass2 = srcmass2 * (1 + redshift)
The model:
;==============================================================================
;
; Gaussian noise model with marginalized phase
;
;==============================================================================
;
; This provides settings for the marginalized Gaussian noise model. The
; low-frequency-cutoff of 20Hz is adequate for the O1-O3 LIGO and Virgo
; detectors. Change as appropriate for future detectors.
;
; The check-for-valid-times and shift-psd-times-to-valid options mean that
; the model will check to see if there are any data quality flags on during
; the requested analysis times (determined by the [data] section). If there
; are any flags on during the analysis times for a detector, that detector
; will automatically be removed from the analysis (an error will be raised if
; all detectors are removed). If you do not want this to happen, uncomment
; the err-on-missing-detectors option. In that case, an error will be raised
; if any detectors have a data quality flag on. The shift-psd-times-to-valid
; argument will cause the times used for estimating the PSD (determined by
; the psd-start|end-time arguments in the [data] section) to be shifted left
; or right to avoid any data quality flags. A shift up to +/- dT/2 will be
; tried, where dT is the difference between psd-end-time and psd-start-time.
; If no valid data can be found even with the maximum shift, the detector
; will be removed the analysis. To check for valid-times, the dq-* options
; are used in the strain model. See
; http://pycbc.org/pycbc/latest/html/inference.html#setting-data for details.
;
; The ignore-failed-waveforms option tells the model to treat points in
; parameter space that cause the waveform generator to fail as having 0
; likelihood. This may be necessary for newer precessing models, in which
; the entire parameter space has not been fully tested. Note, however, that
; in that case you will not be able to tell if some parameters have been ruled
; out because the data disfavors them, or because the model failed. For this
; reason it is best to let the code raise an error (i.e., leave the option
; commented out), and only ignore these errors once you are confident you know
; the reason.
;
[model]
name = marginalized_phase
low-frequency-cutoff = 20
check-for-valid-times =
shift-psd-times-to-valid =
;err-on-missing-detectors =
;ignore-failed-waveforms =
The sampler:
;==============================================================================
;
; Emcee PT settings for CBC, comoving volume
;
;==============================================================================
;
; The following provides standard settings for emcee_pt when analying a
; combact binary merger. This assumes that the prior is specified in terms
; of the source masses (srcmass1, srcmass2) and comoving volume
; (comoving_volume). To speed up convergence, the source masses are sampled in
; chirp mass and mass ratio, and the comoving volume is sampled in the log.
;
; We set the number of effective samples to 1500 because we've found that
; emcee_pt struggles to acquire more than ~8 independent samples per walker.
;
[sampler]
name = emcee_pt
nwalkers = 200
ntemps = 20
effective-nsamples = 1500
checkpoint-interval = 2000
max-samples-per-chain = 1000
[sampler-burn_in]
burn-in-test = nacl & max_posterior
[sampling_params]
srcmass1, srcmass2 = mchirp, q
comoving_volume = logv
[sampling_transforms-mchirp+q]
name = mass1_mass2_to_mchirp_q
mass1_param = srcmass1
mass2_param = srcmass2
[sampling_transforms-logv]
name = log
inputvar = comoving_volume
outputvar = logv
The data: We also need configuration files for the data. Since GW150914 occured during O1 while GW170814 occurred during O2, we need both the standard O1 and O2 files:
;==============================================================================
;
; Settings for analyzing O1 data
;
;==============================================================================
;
; This provides standard settings for analyzing H1, and L1 data in O1.
; It uses "OVERRIDE" for parameters that event-specific. Replace OVERRIDE
; either by editing this file, or using the config-override option in
; pycbc_inference.
[data]
instruments = H1 L1
trigger-time = OVERRIDE
analysis-start-time = OVERRIDE
analysis-end-time = OVERRIDE
psd-estimation = median-mean
psd-start-time = -256
psd-end-time = 256
psd-inverse-length = 8
psd-segment-length = 8
psd-segment-stride = 4
; If you are running on the Atlas cluster, an LDG cluster, or any computer
; with a ligo-data-server, you can use the frame-type argument to automatically
; locate the location of the frame files containing the data. If you are not
; running on one of those computers, download the necessary data from GWOSC
; (gwosc.org), remove the frame-type argument, and uncomment
; frame-files, pointing the latter to the files you downloaded.
;frame-files = H1:/PATH/TO/DOWNLOADED/H1FRAME.gwf L1:/PATH/TO/DOWNLOADED/L1FRAME.gwf
frame-type = H1:H1_LOSC_16_V1 L1:L1_LOSC_16_V1
channel-name = H1:GWOSC-16KHZ_R1_STRAIN L1:GWOSC-16KHZ_R1_STRAIN
; A sample rate of 2048 is sufficient for BBH. If you are analyzing a BNS or
; NSBH, change to 4096.
sample-rate = 2048
strain-high-pass = 15
pad-data = 8
;==============================================================================
;
; Settings for analyzing O2 data
;
;==============================================================================
;
; This provides standard settings for analyzing H1, L1, and V1 data in O2.
; It uses "OVERRIDE" for parameters that event-specific. Replace OVERRIDE
; either by editing this file, or using the config-override option in
; pycbc_inference.
;
[data]
instruments = H1 L1 V1
trigger-time = OVERRIDE
analysis-start-time = OVERRIDE
analysis-end-time = OVERRIDE
psd-estimation = median-mean
psd-start-time = -256
psd-end-time = 256
psd-inverse-length = 8
psd-segment-length = 8
psd-segment-stride = 4
; If you are running on the Atlas cluster, an LDG cluster, or any computer
; with a ligo-data-server, you can use the frame-type argument to automatically
; locate the location of the frame files containing the data. If you are not
; running on one of those computers, download the necessary data from GWOSC
; (gwosc.org), remove the frame-type argument, and uncomment
; frame-files, pointing the latter to the files you downloaded.
frame-type = H1:H1_GWOSC_O2_16KHZ_R1 L1:L1_GWOSC_O2_16KHZ_R1 V1:V1_GWOSC_O2_16KHZ_R1
;frame-files = H1:/PATH/TO/DOWNLOADED/H1FRAME.gwf L1:/PATH/TO/DOWNLOADED/L1FRAME.gwf V1:/PATH/TO/DOWNLOADED/V1FRAME.gwf
channel-name = H1:GWOSC-16KHZ_R1_STRAIN L1:GWOSC-16KHZ_R1_STRAIN V1:GWOSC-16KHZ_R1_STRAIN
; A sample rate of 2048 is sufficient for BBH. If you are analyzing a BNS or
; NSBH, change to 4096.
sample-rate = 2048
strain-high-pass = 15
pad-data = 8
Setup the workflow configuration file
As discussed above, the workflow configuration file specifes what events to
analyze, what programs to run, and what settings to use for those programs.
Since the same general workflow settings can be used for different classes of
events, here we have split the workflow configuration file into two separate
files, events.ini
and workflow_config.ini
. The former specifies what
events we are analyzing in this run, while the latter specifies all of the
other settings. As we will see below, we can simply provide these two files to
pycbc_make_inference_workflow
’s --config-file
argument; it will
automatically combine them into a single file.
The events:
[event-gw150914]
label = GW150914+09:50:45UTC
config-files = bbh-uniform_comoving_volume.ini
marginalized_phase.ini
emcee_pt-srcmasses_comoving_volume.ini
o1.ini
config-overrides = data:trigger-time:1126259462.413
data:analysis-start-time:-6
data:analysis-end-time:2
; We'll run inference twice to double the number of independent samples
nruns = 2
[event-gw170814]
label = GW170814+10:30:43UTC
config-files = bbh-uniform_comoving_volume.ini
marginalized_phase.ini
emcee_pt-srcmasses_comoving_volume.ini
o2.ini
config-overrides = data:trigger-time:1186741861.533
data:analysis-start-time:-6
data:analysis-end-time:2
; We'll run inference twice to double the number of independent samples
nruns = 2
The rest of the configuration file:
[workflow]
; basic information used by the workflow generator
file-retention-level = all_triggers
; The start/end times here are just used for file naming. They can be set
; to anything -- they aren't used for anything, and have no effect on the
; analysis. The actual analysis times used are set by the [data] section in
; the configuration files given to pycbc_inference (specified in the events
; config file).
start-time = 1126259200
end-time = 1126259600
[workflow-ifos]
; The ifos listed here are just used for file naming, it doesn't matter if
; they are not consistent with the actual detectors analyzed.
h1 =
l1 =
v1 =
[extract_posterior]
; Here, we'll ensure that the output parameters are such that mass1 >= mass2
; (and associated spins), change comoving volume into redshift and distance,
; add mchirp, q, chi_eff, and chi_p to the posterior files.
parameters = 'primary_mass(srcmass1, srcmass2):srcmass1'
'secondary_mass(srcmass1, srcmass2):srcmass2'
'primary_spin(srcmass1, srcmass2, spin1_a, spin2_a):spin1_a'
'primary_spin(srcmass1, srcmass2, spin1_azimuthal, spin2_azimuthal):spin1_azimuthal'
'primary_spin(srcmass1, srcmass2, spin1_polar, spin2_polar):spin1_polar'
'secondary_spin(srcmass1, srcmass2, spin1_a, spin2_a):spin2_a'
'secondary_spin(srcmass1, srcmass2, spin1_azimuthal, spin2_azimuthal):spin2_azimuthal'
'secondary_spin(srcmass1, srcmass2, spin1_polar, spin2_polar):spin2_polar'
'mchirp_from_mass1_mass2(srcmass1, srcmass2):srcmchirp'
'chi_eff_from_spherical(srcmass1, srcmass2, spin1_a, spin1_polar, spin2_a, spin2_polar):chi_eff'
'chi_p_from_spherical(srcmass1, srcmass2, spin1_a, spin1_azimuthal, spin1_polar, spin2_a, spin2_azimuthal, spin2_polar):chi_p'
'redshift_from_comoving_volume(comoving_volume):redshift'
'distance_from_comoving_volume(comoving_volume):distance'
'*'
force =
[workflow-summary_table]
; Parameters that will be printed in the summary table.
; These must be from the set specified in extract_posterior.
table-params = srcmass1 srcmass2
srcmchirp 'q_from_mass1_mass2(srcmass1, srcmass2):q'
chi_eff chi_p
ra dec delta_tc
distance redshift
'snr_from_loglr(loglikelihood-lognl):SNR'
; The additional metadata will be printed below the table. We can print
; anything that is in the posterior files' attrs.
print-metadata = 'trigger_time:$t_0$' 'analyzed_detectors:Detectors'
[workflow-summary_plots]
; Parameter posteriors that will plotted on the summary page.
; These must be from the set specified in extract_posterior.
; Each plot-group corresponds to a single plot that will be plot on the
; summary page. Generally, these should be limited to 1 or 2 dimensions
; (although this is not enforced); larger corner plots can be put in the
; Posteriors page. The plots for those are set by the [workflow-plot_params]
; section (see below).
; The settings for the posterior plots created here are read from the
; [plot_posterior_summary] section.
plot-group-mass1_mass2 = srcmass1 srcmass2
plot-group-inc_distance = inclination distance
plot-group-chip_chieff = chi_p chi_eff
; Notice that we are not including ra and dec here. The sky map is
; created by [plot_skymap].
[workflow-plot_params]
; Parameter posteriors that will plotted on the "Posteriors" page.
; These must be from the set specified in extract_posterior.
; Each plot-group corresponds to a single plot that will be plot on the
; page. Since the events are split into their own sub-pages, it's ok to make
; large corner plots here (although too large and it will be hard to make
; out what each parameter is doing).
; The settings for the posterior plots created here are read from the
; [plot_posterior] section.
; Since we plotted source-frame masses on the summary page, here we'll
; plot detector-frame masses.
plot-group-masses = 'srcmass1*(1+redshift):mass1'
'srcmass2*(1+redshift):mass2'
'srcmchirp*(1+redshift):mchirp'
'q_from_mass1_mass2(srcmass1, srcmass2):q'
plot-group-spins = spin1_a spin2_a
spin1_azimuthal spin2_azimuthal
spin1_polar spin2_polar
chi_eff chi_p
plot-group-extrinsic = ra dec delta_tc polarization inclination distance redshift
[executables]
; paths to executables to use in workflow
inference = ${which:run_pycbc_inference}
extract_posterior = ${which:pycbc_inference_extract_samples}
plot_posterior = ${which:pycbc_inference_plot_posterior}
plot_posterior_summary = ${which:pycbc_inference_plot_posterior}
plot_prior = ${which:pycbc_inference_plot_prior}
table_summary = ${which:pycbc_inference_table_summary}
create_fits_file = ${which:pycbc_inference_create_fits}
plot_skymap = ${which:pycbc_inference_plot_skymap}
plot_spectrum = ${which:pycbc_plot_psd_file}
results_page = ${which:pycbc_make_html_page}
; diagnostic plots
plot_acceptance_rate = ${which:pycbc_inference_plot_acceptance_rate}
plot_samples = ${which:pycbc_inference_plot_samples}
[pegasus_profile]
; +MaxRunTimeHours is needed for running on the ATLAS cluster; comment out
; if your cluster does not need this.
condor|+MaxRunTimeHours = 1
[pegasus_profile-inference]
condor|request_memory = 40G
; +MaxRunTimeHours is needed for running on the ATLAS cluster; comment out
; if your cluster does not need this.
condor|+MaxRunTimeHours = 10
condor|request_cpus = ${inference|nprocesses}
[pegasus_profile-plot_prior]
condor|request_memory = 4G
[pegasus_profile-plot_skymap]
condor|request_memory = 4G
[pegasus_profile-plot_posterior]
condor|request_memory = 4G
[pegasus_profile-plot_posterior_summary]
condor|request_memory = 4G
[pegasus_profile-plot_samples]
condor|request_memory = 4G
[inference]
; Command line options for pycbc_inference.
verbose =
; Set the nprocesses to the number of cores you want each job to use. The
; value you use is cluster dependent.
nprocesses = 32
[plot_posterior_summary]
; These are the command line options that will be passed to
; pycbc_inference_plot_posterior for creating the posterior plots on the
; summary page. These settings will cause density plots to be made.
plot-contours =
plot-marginal =
plot-density =
density-cmap = Blues
contour-color = black
[plot_posterior]
; These are the command line options that will be passed to
; pycbc_inference_plot_posterior for creating the posterior plots on the
; posteriors page. These settings will cause scatter plots to be made showing
; each point in the posterior, colored by the matched-filter SNR.
plot-contours =
plot-marginal =
plot-scatter =
z-arg = snr
[create_fits_file]
; These are the settings for creating a fits file, which is used to produce
; the skymaps. This program needs ligo.skymap to be installed.
; The maxpts option limits the number of points in the posterior that are used
; to create the skymap. This is mostly for speeding up run time. Comment out
; to use all points.
maxpts = 1000
; Since the posterior file stores delta_tc, we need to tell the fits
; file how to calculate tc
tc = 'trigger_time+delta_tc'
[plot_skymap]
; These are settings for creating the skymap. This program requires
; ligo.skymap to be installed. Here, we're just setting the colormap to be
; the same as the posterior density plots, above.
colormap = ${plot_posterior_summary|density-cmap}
[plot_prior]
; This sets command-line options to use for the plot prior function. These
; plots are on the "priors" page. The default (giving no options) is to
; plot all of the variable params.
[table_summary]
; This sets command-line options for the table on the summary page. You
; should not need to set anything here.
[plot_spectrum]
; This sets command-line options for the ASD plots on the detector sensitivity
; page. The dyn-range-factor needs to be set to 1.
dyn-range-factor = 1
[plot_acceptance_rate]
; This sets command-line options for the acceptance rate diagnostic plots.
; This should only be used for MCMC samplers. You do not need to set anything
; here for this plot.
[plot_samples]
; This sets command-line options for the plot of samples chains.
; This should only be used for MCMC samplers. Here, we are telling it to plot
; all chains, and to show every single iteration.
chains = all
thin-start = 0
thin-interval = 1
[results_page]
; This sets settings for creating the results page. You may want to change
; the analysis title, to make it more descriptive.
analysis-title = "Inference results"
Notes:
Since the
[executables]
section contains entries forcreate_fits_file
andplot_skymap
, the workflow will try to create sky maps. This requires a Python 3.x environment andligo.skymap
to be installed. If you have not installedligo.skymap
yet, do so by running:pip install ligo.skymapIf you do not want to create sky maps, or are running a Python 2.7 environment, you can turn this off by simply commenting out or removing
create_fits_file
andplot_skymap
from the[executables]
section.The number of cores that will be used by
pycbc_inference
is set by thenprocesses
argument in the[inference]
section. You should set this to the number of cores you expect to be able to get on your cluster. In the configuration presented here, we are limited to shared memory cores. (It is possible to run using MPI in order to parallelize over a larger number of cores, but that requires special condor settings that must be implemented by your cluster admins. That is outside the scope of these instructions.)Notice that the number of processes that
pycbc_inference
will use is referenced by thecondor|request_cpus
argument in the[pegasus_profile-inference]
section. This argurment is what tells condor how many cores to assign to the job, and so sets the actual number of resourcespycbc_inference
will get. Generally, you want this to be the same as what is fed topycbc_inference
’snprocesses
option.
The workflow_config.ini
file can be used with any of the MCMC samplers when
analyzing a gravitational wave that involves the parameters mentioned in the
file. If you wanted to analyze other binary black holes, you could use this
same file, simply changing the events.ini
file to point to the events
you want to analyze.
Generate the workflow
Assuming that you have downloaded all of the configuration files to the same directory, you can generate the workflow by running the following script:
set -e
WORKFLOW_NAME=inference-gw150914_gw170814
# Set the HTML_DIR to point to your public html page. This is where the results
# page will be written.
HTML_DIR=''
if [ "${HTML_DIR}" == '' ]; then
echo "Please set an HTML_DIR"
exit 1
fi
SEED=978241
pycbc_make_inference_workflow \
--seed ${SEED} \
--config-files workflow_config.ini events.ini \
--workflow-name ${WORKFLOW_NAME} \
--config-overrides results_page:output-path:${HTML_DIR}/${WORKFLOW_NAME}
Note that you need to set the HTML_DIR
before running. This tells the
workflow where to save the results page when done. You can also change
WORKFLOW_NAME
if you like.
You should also change the SEED
everytime you create a different workflow.
This sets the seed that is passed to pycbc_inference
(you set it here
because it will be incremented for every pycbc_inference
job that will be
run in the workflow).
After the workflow has finished it will have created a directory named
${WORKFLOW_NAME}-output
. This contains the dax
and all necessary files
to run the workflow.
Plan and execute the workflow
Change directory into the ${WORKFLOW_NAME}-output
directory:
cd ${WORKFLOW_NAME}-output
If you are on the ATLAS cluster (at AEI Hannover) or on an LDG cluster, you need to define an accounting group tag (talk to your cluster admins if you do not know what this is). Once you know what accounting-group tag to use, plan and submit the workflow with:
# submit workflow
pycbc_submit_dax --dax ${WORKFLOW_NAME}.dax \
--no-grid \
--no-create-proxy \
--enable-shared-filesystem \
--accounting-group ${ACCOUNTING_GROUP}
Here, ${ACCOUNTING_GROUP}
is the appropriate tag for your workflow.
Once it is running, you can monitor the status of the workflow by running
./status
from within the ${WORKFLOW_NAME}-output
directory. If your
workflow fails for any reason, you can see what caused the failure by running
./debug
. If you need to stop the workflow at any point, run ./stop
.
To resume a workflow, run ./start
. If the pycbc_inference
jobs were
still running, and they had checkpointed, they will resume from their last
checkpoint upon restart.
Results page
When the workflow has completed successfully it will write out the results
page to the directory you specified in the create_workflow.sh
script.
You can see what the result page will look like the example
inference-gw150914_gw170814.
Example: GW150914 and GW170814 with dynesty
In this example, we repeat the above analysis, but using the dynesty
sampler. We can use the same
prior
,
model
,
and o1
and
o2
inference configuration
files as above. New files that we need are:
The sampler configuration file for
dynesty
:
;==============================================================================
;
; Dynesty settings
;
;==============================================================================
;
; The following provides standard settings for the dynesty sampler.
;
[sampler]
name = dynesty
dlogz = 0.1
nlive = 2000
An
events
file which usesdynesty
:
[event-gw150914]
label = GW150914+09:50:45UTC
config-files = bbh-uniform_comoving_volume.ini
marginalized_phase.ini
dynesty.ini
o1.ini
config-overrides = data:trigger-time:1126259462.413
data:analysis-start-time:-8
data:analysis-end-time:2
; We can run multiple instances of inference to accumulate more samples by
; setting nruns. This is useful for emcee_pt. However, dynesty generally
; produces enough samples in a single run, so we'll leave this commented out.
; (The default is to do 1 run.)
;nruns = 2
[event-gw170814]
label = GW170814+10:30:43UTC
config-files = bbh-uniform_comoving_volume.ini
marginalized_phase.ini
dynesty.ini
o2.ini
config-overrides = data:trigger-time:1186741861.533
data:analysis-start-time:-8
data:analysis-end-time:2
Note that here, we are not running pycbc_inference
multiple times. This is
because a single run of dynesty
with the settings we are using (2000 live
points) produces a large number of (O(10 000)) samples.
We also need a slightly different
workflow configuration file
. The only difference from the workflow configuration file from the one above
is that the diagnostic plot executable have been removed
(plot_acceptance_rate
and plot_samples
). This is because these
diagnostics do not work for dynesty
, a nested sampler. As above, set the
nprocesses argument in the [inference]
section to the number of cores that
works for your cluster.*
Note that we could have run both the emcee_pt
analysis, above, and the
dynesty
analysis together in a single workflow. However, to do so, we would
need to remove any diagnostic plots that are unique to each sampler.
Once you have downloaded the necessary files, create the workflow and launch
it using the same create_workflow.sh
script and pycbc_submit_dax
commands as above, making sure to change the WORKFLOW_NAME
and SEED
.
This will produce a results page that looks like the example inference-dynesty-gw150914_gw170814.