pycbc.inference package

Subpackages

Submodules

pycbc.inference.burn_in module

This modules provides classes and functions for determining when Markov Chains have burned in.

class pycbc.inference.burn_in.BaseBurnInTests(sampler, burn_in_test, **kwargs)[source]

Bases: object

Base class for burn in tests.

available_tests = ('halfchain', 'min_iterations', 'max_posterior', 'posterior_step', 'nacl')
abstract burn_in_index(filename)[source]

The burn in index (retrieved from the iteration).

This is an abstract method because how this is evaluated depends on if this is an ensemble MCMC or not.

abstract evaluate(filename)[source]

Performs all tests and evaluates the results to determine if and when all tests pass.

classmethod from_config(cp, sampler)[source]

Loads burn in from section [sampler-burn_in].

halfchain(filename)[source]

Just uses half the chain as the burn-in iteration.

abstract max_posterior(filename)[source]

Carries out the max posterior test and stores the results.

min_iterations(filename)[source]

Just checks that the sampler has been run for the minimum number of iterations.

abstract nacl(filename)[source]

Carries out the nacl test and stores the results.

abstract posterior_step(filename)[source]

Carries out the posterior step test and stores the results.

write(fp, path=None)[source]

Writes burn-in info to an open HDF file.

Parameters:
  • fp (pycbc.inference.io.base.BaseInferenceFile) – Open HDF file to write the data to. The HDF file should be an instance of a pycbc BaseInferenceFile.

  • path (str, optional) – Path in the HDF file to write the data to. Default is (None) is to write to the path given by the file’s sampler_group attribute.

class pycbc.inference.burn_in.EnsembleMCMCBurnInTests(sampler, burn_in_test, **kwargs)[source]

Bases: BaseBurnInTests

Provides methods for estimating burn-in of an ensemble MCMC.

available_tests = ('halfchain', 'min_iterations', 'max_posterior', 'posterior_step', 'nacl', 'ks_test')
burn_in_index(filename)[source]

The burn in index (retrieved from the iteration).

evaluate(filename)[source]

Runs all of the burn-in tests.

ks_test(filename)[source]

Applies ks burn-in test.

max_posterior(filename)[source]

Applies max posterior test to self.

nacl(filename)[source]

Applies the nacl() test.

posterior_step(filename)[source]

Applies the posterior-step test.

class pycbc.inference.burn_in.EnsembleMultiTemperedMCMCBurnInTests(sampler, burn_in_test, **kwargs)[source]

Bases: EnsembleMCMCBurnInTests

Adds support for multiple temperatures to EnsembleMCMCBurnInTests.

class pycbc.inference.burn_in.MCMCBurnInTests(sampler, burn_in_test, **kwargs)[source]

Bases: BaseBurnInTests

Burn-in tests for collections of independent MCMC chains.

This differs from EnsembleMCMCBurnInTests in that chains are treated as being independent of each other. The is_burned_in attribute will be True if any chain passes the burn in tests (whereas in MCMCBurnInTests, all chains must pass the burn in tests). In other words, independent samples can be collected even if all of the chains are not burned in.

burn_in_index(filename)[source]

The burn in index (retrieved from the iteration).

evaluate(filename)[source]

Runs all of the burn-in tests.

max_posterior(filename)[source]

Applies max posterior test.

nacl(filename)[source]

Applies the nacl() test.

posterior_step(filename)[source]

Applies the posterior-step test.

write(fp, path=None)[source]

Writes burn-in info to an open HDF file.

Parameters:
  • fp (pycbc.inference.io.base.BaseInferenceFile) – Open HDF file to write the data to. The HDF file should be an instance of a pycbc BaseInferenceFile.

  • path (str, optional) – Path in the HDF file to write the data to. Default is (None) is to write to the path given by the file’s sampler_group attribute.

class pycbc.inference.burn_in.MultiTemperedMCMCBurnInTests(sampler, burn_in_test, **kwargs)[source]

Bases: MCMCBurnInTests

Adds support for multiple temperatures to MCMCBurnInTests.

pycbc.inference.burn_in.evaluate_tests(burn_in_test, test_is_burned_in, test_burn_in_iter)[source]

Evaluates burn in data from multiple tests.

The iteration to use for burn-in depends on the logic in the burn-in test string. For example, if the test was ‘max_posterior | nacl’ and max_posterior burned-in at iteration 5000 while nacl burned in at iteration 6000, we’d want to use 5000 as the burn-in iteration. However, if the test was ‘max_posterior & nacl’, we’d want to use 6000 as the burn-in iteration. This function handles all cases by doing the following: first, take the collection of burn in iterations from all the burn in tests that were applied. Next, cycle over the iterations in increasing order, checking which tests have burned in by that point. Then evaluate the burn-in string at that point to see if it passes, and if so, what the iteration is. The first point that the test passes is used as the burn-in iteration.

Parameters:
  • burn_in_test (str) – The test to apply; e.g., 'max_posterior & nacl'.

  • test_is_burned_in (dict) – Dictionary of test name -> boolean indicating whether a specific burn in test has passed.

  • test_burn_in_iter (dict) – Dictionary of test name -> int indicating when a specific test burned in.

Returns:

  • is_burned_in (bool) – Whether or not the data passes all burn in tests.

  • burn_in_iteration – The iteration at which all the tests pass. If the tests did not all pass (is_burned_in is false), then returns NOT_BURNED_IN_ITER.

pycbc.inference.burn_in.ks_test(samples1, samples2, threshold=0.9)[source]

Applies a KS test to determine if two sets of samples are the same.

The ks test is applied parameter-by-parameter. If the two-tailed p-value returned by the test is greater than threshold, the samples are considered to be the same.

Parameters:
  • samples1 (dict) – Dictionary of mapping parameters to the first set of samples.

  • samples2 (dict) – Dictionary of mapping parameters to the second set of samples.

  • threshold (float) – The thershold to use for the p-value. Default is 0.9.

Returns:

Dictionary mapping parameter names to booleans indicating whether the given parameter passes the KS test.

Return type:

dict

pycbc.inference.burn_in.max_posterior(lnps_per_walker, dim)[source]

Burn in based on samples being within dim/2 of maximum posterior.

Parameters:
  • lnps_per_walker (2D array) – Array of values that are proportional to the log posterior values. Must have shape nwalkers x niterations.

  • dim (int) – The dimension of the parameter space.

Returns:

  • burn_in_idx (array of int) – The burn in indices of each walker. If a walker is not burned in, its index will be be equal to the length of the chain.

  • is_burned_in (array of bool) – Whether or not a walker is burned in.

pycbc.inference.burn_in.nacl(nsamples, acls, nacls=5)[source]

Burn in based on ACL.

This applies the following test to determine burn in:

  1. The first half of the chain is ignored.

  2. An ACL is calculated from the second half.

  3. If nacls times the ACL is < the length of the chain / 2, the chain is considered to be burned in at the half-way point.

Parameters:
  • nsamples (int) – The number of samples of in the chain(s).

  • acls (dict) – Dictionary of parameter -> ACL(s). The ACLs for each parameter may be an integer or an array of integers (for multiple chains).

  • nacls (int, optional) – The number of ACLs the chain(s) must have gone past the halfway point in order to be considered burned in. Default is 5.

Returns:

Dictionary of parameter -> boolean(s) indicating if the chain(s) pass the test. If an array of values was provided for the acls, the values will be arrays of booleans.

Return type:

dict

pycbc.inference.burn_in.posterior_step(logposts, dim)[source]

Finds the last time a chain made a jump > dim/2.

Parameters:
  • logposts (array) – 1D array of values that are proportional to the log posterior values.

  • dim (int) – The dimension of the parameter space.

Returns:

The index of the last time the logpost made a jump > dim/2. If that never happened, returns 0.

Return type:

int

pycbc.inference.entropy module

The module contains functions for calculating the Kullback-Leibler divergence.

pycbc.inference.entropy.check_hist_params(samples, hist_min, hist_max, hist_bins)[source]

Checks that the bound values given for the histogram are consistent, returning the range if they are or raising an error if they are not. Also checks that if hist_bins is a str, it corresponds to a method available in numpy.histogram

Parameters:
  • samples (numpy.array) – Set of samples to get the min/max if only one of the bounds is given.

  • hist_min (numpy.float64) – Minimum value for the histogram.

  • hist_max (numpy.float64) – Maximum value for the histogram.

  • hist_bins (int or str) – If int, number of equal-width bins to use in numpy.histogram. If str, it should be one of the methods to calculate the optimal bin width available in numpy.histogram: [‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, ‘sqrt’]. Default is ‘fd’ (Freedman Diaconis Estimator). This option will be ignored if kde=True.

Returns:

  • hist_range (tuple or None) – The bounds (hist_min, hist_max) or None.

  • hist_bins (int or str) – Number of bins or method for optimal width bin calculation.

pycbc.inference.entropy.compute_pdf(samples, method, bins, hist_min, hist_max)[source]

Computes the probability density function for a set of samples.

Parameters:
  • samples (numpy.array) – Set of samples to calculate the pdf.

  • method (str) – Method to calculate the pdf. Options are ‘kde’ for the Kernel Density Estimator, and ‘hist’ to use numpy.histogram

  • bins (str or int, optional) – This option will be ignored if method is kde. If int, number of equal-width bins to use when calculating probability density function from a set of samples of the distribution. If str, it should be one of the methods to calculate the optimal bin width available in numpy.histogram: [‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, ‘sqrt’]. Default is ‘fd’ (Freedman Diaconis Estimator).

  • hist_min (numpy.float64, optional) – Minimum of the distributions’ values to use. This will be ignored if kde=True.

  • hist_max (numpy.float64, optional) – Maximum of the distributions’ values to use. This will be ignored if kde=True.

Returns:

pdf – Discrete probability distribution calculated from samples.

Return type:

numpy.array

pycbc.inference.entropy.entropy(pdf1, base=2.718281828459045)[source]

Computes the information entropy for a single parameter from one probability density function.

Parameters:
  • pdf1 (numpy.array) – Probability density function.

  • base ({numpy.e, numpy.float64}, optional) – The logarithmic base to use (choose base 2 for information measured in bits, default is nats).

Returns:

The information entropy value.

Return type:

numpy.float64

pycbc.inference.entropy.js(samples1, samples2, kde=False, bins=None, hist_min=None, hist_max=None, base=2.718281828459045)[source]

Computes the Jensen-Shannon divergence for a single parameter from two distributions.

Parameters:
  • samples1 (numpy.array) – Samples.

  • samples2 (numpy.array) – Samples.

  • kde (bool) – Set to True to estimate the probability density function using kernel density estimation (KDE).

  • bins (int or str, optional) – If int, number of equal-width bins to use when calculating probability density function from a set of samples of the distribution. If str, it should be one of the methods to calculate the optimal bin width available in numpy.histogram: [‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, ‘sqrt’]. Default is ‘fd’ (Freedman Diaconis Estimator). This option will be ignored if kde=True.

  • hist_min (numpy.float64) – Minimum of the distributions’ values to use. This will be ignored if kde=True.

  • hist_max (numpy.float64) – Maximum of the distributions’ values to use. This will be ignored if kde=True.

  • base (numpy.float64) – The logarithmic base to use (choose base 2 for information measured in bits, default is nats).

Returns:

The Jensen-Shannon divergence value.

Return type:

numpy.float64

pycbc.inference.entropy.kl(samples1, samples2, pdf1=False, pdf2=False, kde=False, bins=None, hist_min=None, hist_max=None, base=2.718281828459045)[source]

Computes the Kullback-Leibler divergence for a single parameter from two distributions.

Parameters:
  • samples1 (numpy.array) – Samples or probability density function (for the latter must also set pdf1=True).

  • samples2 (numpy.array) – Samples or probability density function (for the latter must also set pdf2=True).

  • pdf1 (bool) – Set to True if samples1 is a probability density funtion already.

  • pdf2 (bool) – Set to True if samples2 is a probability density funtion already.

  • kde (bool) – Set to True if at least one of pdf1 or pdf2 is False to estimate the probability density function using kernel density estimation (KDE).

  • bins (int or str, optional) – If int, number of equal-width bins to use when calculating probability density function from a set of samples of the distribution. If str, it should be one of the methods to calculate the optimal bin width available in numpy.histogram: [‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, ‘sqrt’]. Default is ‘fd’ (Freedman Diaconis Estimator). This option will be ignored if kde=True.

  • hist_min (numpy.float64) – Minimum of the distributions’ values to use. This will be ignored if kde=True.

  • hist_max (numpy.float64) – Maximum of the distributions’ values to use. This will be ignored if kde=True.

  • base (numpy.float64) – The logarithmic base to use (choose base 2 for information measured in bits, default is nats).

Returns:

The Kullback-Leibler divergence value.

Return type:

numpy.float64

pycbc.inference.evidence module

This modules provides functions for estimating the marginal likelihood or evidence of a model.

pycbc.inference.evidence.arithmetic_mean_estimator(log_likelihood)[source]

Returns the log evidence via the prior arithmetic mean estimator (AME).

The logarithm form of AME is used. This is the most basic evidence estimator, and often requires O(billions) of samples from the prior.

Parameters:

log_likelihood (1d array of floats) – The log likelihood of the data sampled from the prior distribution.

Returns:

Estimation of the log of the evidence.

Return type:

float

pycbc.inference.evidence.harmonic_mean_estimator(log_likelihood)[source]

Returns the log evidence via posterior harmonic mean estimator (HME).

The logarithm form of HME is used. This method is not recommended for general use. It is very slow to converge, formally, has infinite variance, and very error prone.

Not recommended for general use.

Parameters:

log_likelihood (1d array of floats) – The log likelihood of the data sampled from the posterior distribution.

Returns:

Estimation of the log of the evidence.

Return type:

float

pycbc.inference.evidence.stepping_stone_algorithm(log_likelihood, betas)[source]

Returns the log evidence of the model via stepping stone algorithm. Also returns an estimated standard deviation for the log evidence.

Parameters:
  • log_likelihood (3d array of shape (betas, walker, iteration)) – The log likelihood for each temperature separated by temperature, walker, and iteration.

  • betas (1d array) – The inverse temperatures used in the MCMC.

Returns:

  • log_evidence (float) – Estimation of the log of the evidence.

  • mcmc_std (float) – The standard deviation of the log evidence estimate from Monte-Carlo spread.

pycbc.inference.evidence.thermodynamic_integration(log_likelihood, betas, method='simpsons')[source]

Returns the log evidence of the model via thermodynamic integration. Also returns an estimated standard deviation for the log evidence.

Current options are integration through the trapezoid rule, a first-order corrected trapezoid rule, and Simpson’s rule.

Parameters:
  • log_likelihood (3d array of shape (betas, walker, iteration)) – The log likelihood for each temperature separated by temperature, walker, and iteration.

  • betas (1d array) – The inverse temperatures used in the MCMC.

  • method ({"trapzoid", "trapezoid_corrected", "simpsons"},) – optional. The numerical integration method to use for the thermodynamic integration. Choices include: “trapezoid”, “trapezoid_corrected”, “simpsons”, for the trapezoid rule, the first-order correction to the trapezoid rule, and Simpson’s rule. [Default = “simpsons”]

Returns:

  • log_evidence (float) – Estimation of the log of the evidence.

  • mcmc_std (float) – The standard deviation of the log evidence estimate from Monte-Carlo spread.

pycbc.inference.gelman_rubin module

This modules provides functions for evaluating the Gelman-Rubin convergence diagnostic statistic.

pycbc.inference.gelman_rubin.gelman_rubin(chains, auto_burn_in=True)[source]

Calculates the univariate Gelman-Rubin convergence statistic which compares the evolution of multiple chains in a Markov-Chain Monte Carlo process and computes their difference to determine their convergence. The between-chain and within-chain variances are computed for each sampling parameter, and a weighted combination of the two is used to determine the convergence. As the chains converge, the point scale reduction factor should go to 1.

Parameters:
  • chains (iterable) – An iterable of numpy.array instances that contain the samples for each chain. Each chain has shape (nparameters, niterations).

  • auto_burn_in (bool) – If True, then only use later half of samples provided.

Returns:

psrf – A numpy.array of shape (nparameters) that has the point estimates of the potential scale reduction factor.

Return type:

numpy.array

pycbc.inference.gelman_rubin.walk(chains, start, end, step)[source]

Calculates Gelman-Rubin conervergence statistic along chains of data. This function will advance along the chains and calculate the statistic for each step.

Parameters:
  • chains (iterable) – An iterable of numpy.array instances that contain the samples for each chain. Each chain has shape (nparameters, niterations).

  • start (float) – Start index of blocks to calculate all statistics.

  • end (float) – Last index of blocks to calculate statistics.

  • step (float) – Step size to take for next block.

Returns:

  • starts (numpy.array) – 1-D array of start indexes of calculations.

  • ends (numpy.array) – 1-D array of end indexes of caluclations.

  • stats (numpy.array) – Array with convergence statistic. It has shape (nparameters, ncalculations).

pycbc.inference.geweke module

Functions for computing the Geweke convergence statistic.

pycbc.inference.geweke.geweke(x, seg_length, seg_stride, end_idx, ref_start, ref_end=None, seg_start=0)[source]

Calculates Geweke conervergence statistic for a chain of data. This function will advance along the chain and calculate the statistic for each step.

Parameters:
  • x (numpy.array) – A one-dimensional array of data.

  • seg_length (int) – Number of samples to use for each Geweke calculation.

  • seg_stride (int) – Number of samples to advance before next Geweke calculation.

  • end_idx (int) – Index of last start.

  • ref_start (int) – Index of beginning of end reference segment.

  • ref_end (int) – Index of end of end reference segment. Default is None which will go to the end of the data array.

  • seg_start (int) – What index to start computing the statistic. Default is 0 which will go to the beginning of the data array.

Returns:

  • starts (numpy.array) – The start index of the first segment in the chain.

  • ends (numpy.array) – The end index of the first segment in the chain.

  • stats (numpy.array) – The Geweke convergence diagnostic statistic for the segment.

pycbc.inference.option_utils module

This module contains standard options used for inference-related programs.

class pycbc.inference.option_utils.ParseLabelArg(type=<class 'str'>, nargs=None, **kwargs)[source]

Bases: Action

Argparse action that will parse arguments that can accept labels.

This assumes that the values set on the command line for its assigned argument are strings formatted like PARAM[:LABEL]. When the arguments are parsed, the LABEL bit is stripped off and added to a dictionary mapping PARAM -> LABEL. This dictionary is stored to the parsed namespace called {dest}_labels, where {dest} is the argument’s dest setting (by default, this is the same as the option string). Likewise, the argument’s dest in the parsed namespace is updated so that it is just PARAM.

If no LABEL is provided, then PARAM will be used for LABEL.

This action can work on arguments that have nargs != 0 and type set to str.

class pycbc.inference.option_utils.ParseParametersArg(type=<class 'str'>, nargs=None, **kwargs)[source]

Bases: ParseLabelArg

Argparse action that will parse parameters and labels from an opton.

Does the same as ParseLabelArg, with the additional functionality that if LABEL is a known parameter in pycbc.waveform.parameters, then the label attribute there will be used in the labels dictionary. Otherwise, LABEL will be used.

Examples

Create a parser and add two arguments that use this action (note that the first argument accepts multiple inputs while the second only accepts a single input):

>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--parameters', type=str, nargs="+",
                        action=ParseParametersArg)
>>> parser.add_argument('--z-arg', type=str, action=ParseParametersArg)

Parse a command line that uses these options:

>>> import shlex
>>> cli = "--parameters 'mass1+mass2:mtotal' ra ni --z-arg foo:bar"
>>> opts = parser.parse_args(shlex.split(cli))
>>> opts.parameters
['mass1+mass2', 'ra', 'ni']
>>> opts.parameters_labels
{'mass1+mass2': '$M~(\mathrm{M}_\odot)$', 'ni': 'ni', 'ra': '$\alpha$'}
>>> opts.z_arg
'foo'
>>> opts.z_arg_labels
{'foo': 'bar'}

In the above, the first argument to --parameters was mtotal. Since this is a recognized parameter in pycbc.waveform.parameters, the label dictionary contains the latex string associated with the mtotal parameter. A label was not provided for the second argument, and so ra was used. Since ra is also a recognized parameter, its associated latex string was used in the labels dictionary. Since ni and bar (the label for z-arg) are not recognized parameters, they were just used as-is in the labels dictionaries.

pycbc.inference.option_utils.add_density_option_group(parser)[source]

Adds the options needed to configure contours and density colour map.

Parameters:

parser (object) – ArgumentParser instance.

pycbc.inference.option_utils.add_injsamples_map_opt(parser)[source]

Adds option to parser to specify a mapping between injection parameters an sample parameters.

pycbc.inference.option_utils.add_plot_posterior_option_group(parser)[source]

Adds the options needed to configure plots of posterior results.

Parameters:

parser (object) – ArgumentParser instance.

pycbc.inference.option_utils.add_scatter_option_group(parser)[source]

Adds the options needed to configure scatter plots.

Parameters:

parser (object) – ArgumentParser instance.

pycbc.inference.option_utils.expected_parameters_from_cli(opts)[source]

Parses the –expected-parameters arguments from the plot_posterior option group.

Parameters:

opts (ArgumentParser) – The parsed arguments from the command line.

Returns:

Dictionary of parameter name -> expected value. Only parameters that were specified in the –expected-parameters option will be included; if no parameters were provided, will return an empty dictionary.

Return type:

dict

pycbc.inference.option_utils.plot_ranges_from_cli(opts)[source]

Parses the mins and maxs arguments from the plot_posterior option group.

Parameters:

opts (ArgumentParser) – The parsed arguments from the command line.

Returns:

  • mins (dict) – Dictionary of parameter name -> specified mins. Only parameters that were specified in the –mins option will be included; if no parameters were provided, will return an empty dictionary.

  • maxs (dict) – Dictionary of parameter name -> specified maxs. Only parameters that were specified in the –mins option will be included; if no parameters were provided, will return an empty dictionary.

Module contents