# The workflow segment generation module¶

## Introduction¶

This page is designed to give you an introduction to the capabilities of the pycbc workflow segment generation module and how to use this as part of a pycbc workflow.

This module is designed to be able to support multiple ways of obtaining these segments (different codes/interfaces whatever), although new code will always be needed to support some code/interface that is not currently supported.

This module will generate science segments and any appropriate veto segments and combine these together to identify a set of segments to be used in the analysis. The various files will also be returned for later use in the analysis (ie. for vetoing triggers with data-quality vetoes). If other workflows require similar combined files these can be added on request.

## Usage¶

Using this module requires a number of things

• A configuration file (or files) containing the information needed to tell this module how to generate the segments (described below).
• An initialized instance of the pycbc Workflow class, containing the ConfigParser.

The module is then called according to

pycbc.workflow.get_segments_file(workflow, name, option_name, out_dir)[source]

Get cumulative segments from option name syntax for each ifo.

Use syntax of configparser string to define the resulting segment_file e.x. option_name = +up_flag1,+up_flag2,+up_flag3,-down_flag1,-down_flag2 Each ifo may have a different string and is stored separately in the file. Flags which add time must precede flags which subtract time.

Parameters: workflow (pycbc.workflow.Workflow) – name (string) – Name of the segment list being created option_name (str) – Name of option in the associated config parser to get the flag list seg_file – SegFile intance that points to the segment xml file on disk. pycbc.workflow.SegFile

### Configuration file setup¶

Here we document the necessary parts of a configuration file for this module. We’ll lay this out in a few blocks. First we’ll give an overview of what this might look like in an O3 analysis, to give some background, then we’ll give a more comprehensive description of the input format in the ini file, and finally we’ll explain the full syntax that can be used for individual flags.

#### Example config file¶

[workflow]
; http://pycbc.org/pycbc/latest/html/workflow/initialization.html
h1-channel-name = H1:GDS-CALIB_STRAIN
l1-channel-name = L1:GDS-CALIB_STRAIN

[workflow-ifos]
h1 =
l1 =

[workflow-datafind]
datafind-h1-frame-type = H1_HOFT_C00
datafind-l1-frame-type = L1_HOFT_C00
;datafind-check-frames-exist = no_test

[workflow-segments]
segments-database-url = https://segments.ligo.org
segments-veto-definer-url = https://git.ligo.org/detchar/veto-definitions/raw/db20ca71e65b54c0b073fd3d84d5f43fd822779e/cbc/O2/H1L1-CBC_VETO_DEFINER_CLEANED_C02_O2_1164556817-23176801.xml
segments-vetoes = +CAT_2,+CAT_H

[workflow-segments-h1]
; NOTE: It's important to check the version number

[workflow-segments-l1]
; NOTE: It's important to check the version number

[datafind]
urltype = file


Note that this includes both datafind and segment instructions. We’ll just describe the segment options

#### Description of the ini file contents¶

There are three important segments options that we need to provide:

• segments-science: This decides what times will be analysed to produce triggers. All times in this might be used to compute PSDs used in the results. Normally all times flagged as ready for science analysis, minus times vetoed at CAT_1.
• segments-vetoes: This decides what times will be vetoed when producing final candidate event lists. These times are analysed but are discarded after combining single-detector triggers together. Normally the time discarded comprises times vetoed at CAT_2 and times of hardware injections.
• segments-veto-definer-url: As previously, this is the location of the veto definer.

Note that this obeys the usual workflow tagging rules. If you supply segments-science in workflow-segments it will be valid for all ifos. Or, if you want to supply different values for different ifos (e.g. because the Virgo SCIENCE flag is named differently to L1 and H1) you can use workflow-segments-${ifoname} (where${ifoname} is replaced with the ifo name and this should then be given for all active ifos).

The segments-science and segments-vetoes are provided as a comma-separated list of flags. Documented below.

#### Flag syntax¶

We’ve said that segments-science and segments-vetoes look something like

segments-science = FLAG_1,FLAG_2

We start with a simple example of what can be given as the value of FLAG_1:

+SCIENCE or -SCIENCE

### OLD Configuration file setup¶

Here we describe the options given in the configuration file used in the workflow that will be needed in this section

#### [workflow-segments] section¶

The configuration file must have a [workflow-segments] section, which is used to tell the workflow how to construct the segments. The first option to choose and provide is

segments-method = VALUE

The choices here and their description are as described below

• AT_RUNTIME - Use the setup_segment_gen_mixed to generate segments and generate all segment files at runtime
• CAT2_PLUS_DAG - Use the setup_segment_gen_mixed to generate segments, generate all veto files up to CATEGORY_1 at runtime, and add jobs to produce the remaining files to the workflow.
• CAT3_PLUS_DAG - Use the setup_segment_gen_mixed to generate segments, generate all veto files up to CATEGORY_2 at runtime, and add jobs to produce the remaining files to the workflow.
• CAT4_PLUS_DAG - Use the setup_segment_gen_mixed to generate segments, generate all veto files up to CATEGORY_3 at runtime, and add jobs to produce the remaining files to the workflow.

Each of these options will describe which subfunction to use. These are described here

pycbc.workflow.setup_segment_gen_mixed(workflow, veto_categories, out_dir, maxVetoAtRunTime, tag=None, generate_coincident_segs=True)[source]

This function will generate veto files for each ifo and for each veto category. It can generate these vetoes at run-time or in the workflow (or do some at run-time and some in the workflow). However, the CAT_1 vetoes and science time must be generated at run time as they are needed to plan the workflow. CATs 2 and higher may be needed for other workflow construction. It can also combine these files to create a set of cumulative, multi-detector veto files, which can be used in ligolw_thinca and in pipedown. Again these can be created at run time or within the workflow.

Parameters: workflow (pycbc.workflow.core.Workflow) – The Workflow instance that the coincidence jobs will be added to. This instance also contains the ifos for which to attempt to obtain segments for this analysis and the start and end times to search for segments over. veto_categories (list of ints) – List of veto categories to generate segments for. If this stops being integers, this can be changed here. out_dir (path) – The directory in which output will be stored. maxVetoAtRunTime (int) – Generate veto files at run time up to this category. Veto categories beyond this in veto_categories will be generated in the workflow. If we move to a model where veto categories are not explicitly cumulative, this will be rethought. tag (string, optional (default=None)) – Use this to specify a tag. This can be used if this module is being called more than once to give call specific configuration (by setting options in [workflow-datafind-${TAG}] rather than [workflow-datafind]). This is also used to tag the Files returned by the class to uniqueify the Files and uniqueify the actual filename. FIXME: Filenames may not be unique with current codes! generate_coincident_segs (boolean, optional (default = True)) – If given this module will generate a set of coincident, cumulative veto files that can be used with ligolw_thinca and pipedown. segFilesList – These are representations of the various segment files that were constructed at this stage of the workflow and may be needed at later stages of the analysis (e.g. for performing DQ vetoes). If the file was generated at run-time the segment lists contained within these files will be an attribute of the instance. (If it will be generated in the workflow it will not be because I am not psychic). dictionary of pycbc.workflow.core.SegFile instances When using the setup_segment_gen_mixed function the following additional options apply • segments-X1-science-name = NAME - REQUIRED. Where X1 is replaced by the ifo name for each ifo. The NAME should be the full channel name corresponding to analysable times for e.g. H1:DMT-SCIENCE:4 • segments-database-url = URL - REQUIRED. The URL to the segment databse that will be used to obtain this information • segments-veto-definer-url = PATH - REQUIRED. The location to the veto-definer file that is used to identify which channels are CAT_1, which are CAT_2 etc. • segments-veto-categories = COMMA-SEPARATED LIST OF INTS - OPTIONAL. Generate veto files for veto categories given by the ints in the list. These ranged from 1 through 4 or 5 for S5/S6 veto definers. Standard results have used categories 2,3,4. • segments-minimum-segment-length = INT - OPTIONAL. If given, any segments of analysable data shorter than INT will not be included in the list of analysable times returned by this module. • segments-generate-coincident-segments - OPTIONAL. Option takes no value. If given the module will generate cumulative, multiple detector coincidence files for easy use in ligolw_thinca and pipedown. • segments-generate-segment-files - OPTIONAL (DEFAULT=’always’). This option can be used if the user wants to re-use segment files generated previously. It is not recommended to use this option unless necessary. Options are • generate_segment_files=’always’ : DEFAULT: All files will be generated even if they already exist. • generate_segment_files=’if_not_present’: Files will be generated if they do not already exist. Pre-existing files will be read in and used. • generate_segment_files=’error_on_duplicate’: Files will be generated if they do not already exist. Pre-existing files will raise a failure. • generate_segment_files=’never’: Pre-existing files will be read in and used. If no file exists the code will fail. #### [executables]¶ The following executable paths must be provided in the [executables] section when running this module: * segment_query = /home/ahnitz/local/lalsuite/bin/ligolw_segment_query * segments_from_cats = /home/ahnitz/local/lalsuite/bin/ligolw_segments_from_cats * llwadd = /home/ahnitz/local/lalsuite/bin/ligolw_add * ligolw_combine_segments = /home/ahnitz/local/lalsuite/bin/ligolw_combine_segments  segment_query is used to obtain the science segments. segments_from_cats is used to obtain the files containing the CAT_1,2,3,4,5 segments. ligolw_combine_segments produces cumulative veto-files. llwadd is used to add the cumulative veto-files from different ifos together when producing cumulative, multiple-detector veto lists. #### Other sections¶ For other sub-modules in the pycbc workflow module we would see sections like [segment_query], [segments_from_cats] etc. which would provide the options provided to those jobs. In this case the codes require rather specific input so for now these are hardcoded in this module and any segment like [segment_query] would either be ignored or could break the code. If there is a reason to do so we could add these sections in. ## pycbc.workflow.segment Module¶ This is complete documentation of this module’s code This module is responsible for setting up the segment generation stage of workflows. For details about this module and its capabilities see here: https://ldas-jobs.ligo.caltech.edu/~cbc/docs/pycbc/ahope/segments.html pycbc.workflow.segment.add_cumulative_files(workflow, output_file, input_files, out_dir, execute_now=False, tags=None)[source] Function to combine a set of segment files into a single one. This function will not merge the segment lists but keep each separate. Parameters: workflow (pycbc.workflow.core.Workflow) – An instance of the Workflow class that manages the workflow. output_file (pycbc.workflow.core.File) – The output file object input_files (pycbc.workflow.core.FileList) – This list of input segment files out_dir (path) – The directory to write output to. execute_now (boolean, optional) – If true, jobs are executed immediately. If false, they are added to the workflow to be run later. tags (list of strings, optional) – A list of strings that is used to identify this job pycbc.workflow.segment.cat_to_veto_def_cat(val)[source] Convert a category character to the corresponding value in the veto definer file. Parameters: str (single character string) – The input category character pipedown_str (str) – The pipedown equivalent notation that can be passed to programs that expect this definition. pycbc.workflow.segment.create_segs_from_cats_job(cp, out_dir, ifo_string, tags=None)[source] This function creates the CondorDAGJob that will be used to run ligolw_segments_from_cats as part of the workflow Parameters: cp (pycbc.workflow.configuration.WorkflowConfigParser) – The in-memory representation of the configuration (.ini) files out_dir (path) – Directory in which to put output files ifo_string (string) – String containing all active ifos, ie. “H1L1V1” tag (list of strings, optional (default=None)) – Use this to specify a tag(s). This can be used if this module is being called more than once to give call specific configuration (by setting options in [workflow-datafind-${TAG}] rather than [workflow-datafind]). This is also used to tag the Files returned by the class to uniqueify the Files and uniqueify the actual filename. FIXME: Filenames may not be unique with current codes! job – The Job instance that will run segments_from_cats jobs Job instance
pycbc.workflow.segment.file_needs_generating(file_path, cp, tags=None)[source]

This job tests the file location and determines if the file should be generated now or if an error should be raised. This uses the generate_segment_files variable, global to this module, which is described above and in the documentation.

Parameters: file_path (path) – Location of file to check cp (ConfigParser) – The associated ConfigParser from which the segments-generate-segment-files variable is returned. It is recommended for most applications to use the default option by leaving segments-generate-segment-files blank, which will regenerate all segment files at runtime. Only use this facility if you need it. Choices are * ‘always’ : DEFAULT: All files will be generated even if they already exist. * ‘if_not_present’: Files will be generated if they do not already exist. Pre-existing files will be read in and used. * ‘error_on_duplicate’: Files will be generated if they do not already exist. Pre-existing files will raise a failure. * ‘never’: Pre-existing files will be read in and used. If no file exists the code will fail. 1 = Generate the file. 0 = File already exists, use it. Other cases will raise an error. int
pycbc.workflow.segment.find_playground_segments(segs)[source]

Finds playground time in a list of segments.

Playground segments include the first 600s of every 6370s stride starting at GPS time 729273613.

Parameters: segs (segmentfilelist) – A segmentfilelist to find playground segments. outlist – A segmentfilelist with all playground segments during the input segmentfilelist (ie. segs). segmentfilelist
pycbc.workflow.segment.generate_triggered_segment(workflow, out_dir, sciencesegs)[source]
pycbc.workflow.segment.get_analyzable_segments(workflow, sci_segs, cat_files, out_dir, tags=None)[source]

Get the analyzable segments after applying ini specified vetoes and any other restrictions on the science segs, e.g. a minimum segment length, or demanding that only coincident segments are analysed.

Parameters: workflow (Workflow object) – Instance of the workflow object sci_segs (Ifo-keyed dictionary of glue.segmentlists) – The science segments for each ifo to which the vetoes, or any other restriction, will be applied. cat_files (FileList of SegFiles) – The category veto files generated by get_veto_segs out_dir (path) – Location to store output files tags (list of strings) – Used to retrieve subsections of the ini file for configuration options. sci_ok_seg_file (workflow.core.SegFile instance) – The segment file combined from all ifos containing the analyzable science segments. sci_ok_segs (Ifo keyed dict of ligo.segments.segmentlist instances) – The analyzable science segs for each ifo, keyed by ifo sci_ok_seg_name (str) – The name with which analyzable science segs are stored in the output XML file.
pycbc.workflow.segment.get_cumulative_segs(workflow, categories, seg_files_list, out_dir, tags=None, execute_now=False, segment_name=None)[source]

Function to generate one of the cumulative, multi-detector segment files as part of the workflow.

Parameters: workflow (pycbc.workflow.core.Workflow) – An instance of the Workflow class that manages the workflow. categories (int) – The veto categories to include in this cumulative veto. seg_files_list (Listionary of SegFiles) – The list of segment files to be used as input for combining. out_dir (path) – The directory to write output to. tags (list of strings, optional) – A list of strings that is used to identify this job execute_now (boolean, optional) – If true, jobs are executed immediately. If false, they are added to the workflow to be run later. segment_name (str) – The name of the combined, cumulative segments in the output file.
pycbc.workflow.segment.get_cumulative_veto_group_files(workflow, option, cat_files, out_dir, execute_now=True, tags=None)[source]

Get the cumulative veto files that define the different backgrounds we want to analyze, defined by groups of vetos.

Parameters: workflow (Workflow object) – Instance of the workflow object option (str) – ini file option to use to get the veto groups cat_files (FileList of SegFiles) – The category veto files generated by get_veto_segs out_dir (path) – Location to store output files execute_now (Boolean) – If true outputs are generated at runtime. Else jobs go into the workflow and are generated then. tags (list of strings) – Used to retrieve subsections of the ini file for configuration options. seg_files (workflow.core.FileList instance) – The cumulative segment files for each veto group. names (list of strings) – The segment names for the corresponding seg_file cat_files (workflow.core.FileList instance) – The list of individual category veto files
pycbc.workflow.segment.get_files_for_vetoes(workflow, out_dir, runtime_names=None, in_workflow_names=None, tags=None)[source]

Get the various sets of veto segments that will be used in this analysis.

Parameters: workflow (Workflow object) – Instance of the workflow object out_dir (path) – Location to store output files runtime_names (list) – Veto category groups with these names in the [workflow-segment] section of the ini file will be generated now. in_workflow_names (list) – Veto category groups with these names in the [workflow-segment] section of the ini file will be generated in the workflow. If a veto category appears here and in runtime_names, it will be generated now. tags (list of strings) – Used to retrieve subsections of the ini file for configuration options. veto_seg_files – List of veto segment files generated FileList
pycbc.workflow.segment.get_sci_segs_for_ifo(ifo, cp, start_time, end_time, out_dir, tags=None)[source]

Obtain science segments for the selected ifo

Parameters: ifo (string) – The string describing the ifo to obtain science times for. start_time (gps time (either int/LIGOTimeGPS)) – The time at which to begin searching for segments. end_time (gps time (either int/LIGOTimeGPS)) – The time at which to stop searching for segments. out_dir (path) – The directory in which output will be stored. tag (string, optional (default=None)) – Use this to specify a tag. This can be used if this module is being called more than once to give call specific configuration (by setting options in [workflow-datafind-${TAG}] rather than [workflow-datafind]). This is also used to tag the Files returned by the class to uniqueify the Files and uniqueify the actual filename. sci_segs (ligo.segments.segmentlist) – The segmentlist generated by this call sci_xml_file (pycbc.workflow.core.SegFile) – The workflow File object corresponding to this science segments file. out_sci_seg_name (string) – The name of the output segment list in the output XML file. pycbc.workflow.segment.get_science_segments(workflow, out_dir, tags=None)[source] Get the analyzable segments after applying ini specified vetoes. Parameters: workflow (Workflow object) – Instance of the workflow object out_dir (path) – Location to store output files tags (list of strings) – Used to retrieve subsections of the ini file for configuration options. sci_seg_file (workflow.core.SegFile instance) – The segment file combined from all ifos containing the science segments. sci_segs (Ifo keyed dict of ligo.segments.segmentlist instances) – The science segs for each ifo, keyed by ifo sci_seg_name (str) – The name with which science segs are stored in the output XML file. pycbc.workflow.segment.get_segments_file(workflow, name, option_name, out_dir)[source] Get cumulative segments from option name syntax for each ifo. Use syntax of configparser string to define the resulting segment_file e.x. option_name = +up_flag1,+up_flag2,+up_flag3,-down_flag1,-down_flag2 Each ifo may have a different string and is stored separately in the file. Flags which add time must precede flags which subtract time. Parameters: workflow (pycbc.workflow.Workflow) – name (string) – Name of the segment list being created option_name (str) – Name of option in the associated config parser to get the flag list seg_file – SegFile intance that points to the segment xml file on disk. pycbc.workflow.SegFile pycbc.workflow.segment.get_triggered_coherent_segment(workflow, sciencesegs)[source] Construct the coherent network on and off source segments. Can switch to construction of segments for a single IFO search when coherent segments are insufficient for a search. Parameters: workflow (pycbc.workflow.core.Workflow) – The workflow instance that the calculated segments belong to. sciencesegs (dict) – Dictionary of all science segments within analysis time. onsource (ligo.segments.segmentlistdict) – A dictionary containing the on source segments for network IFOs offsource (ligo.segments.segmentlistdict) – A dictionary containing the off source segments for network IFOs pycbc.workflow.segment.get_veto_segs(workflow, ifo, category, start_time, end_time, out_dir, veto_gen_job, tags=None, execute_now=False)[source] Obtain veto segments for the selected ifo and veto category and add the job to generate this to the workflow. Parameters: workflow (pycbc.workflow.core.Workflow) – An instance of the Workflow class that manages the workflow. ifo (string) – The string describing the ifo to generate vetoes for. category (int) – The veto category to generate vetoes for. start_time (gps time (either int/LIGOTimeGPS)) – The time at which to begin searching for segments. end_time (gps time (either int/LIGOTimeGPS)) – The time at which to stop searching for segments. out_dir (path) – The directory in which output will be stored. vetoGenJob (Job) – The veto generation Job class that will be used to create the Node. tag (string, optional (default=None)) – Use this to specify a tag. This can be used if this module is being called more than once to give call specific configuration (by setting options in [workflow-datafind-${TAG}] rather than [workflow-datafind]). This is also used to tag the Files returned by the class to uniqueify the Files and uniqueify the actual filename. FIXME: Filenames may not be unique with current codes! execute_now (boolean, optional) – If true, jobs are executed immediately. If false, they are added to the workflow to be run later. veto_def_file – The workflow File object corresponding to this DQ veto file. pycbc.workflow.core.SegFile
pycbc.workflow.segment.parse_cat_ini_opt(cat_str)[source]

Parse a cat str from the ini file into a list of sets

pycbc.workflow.segment.save_veto_definer(cp, out_dir, tags=None)[source]

Retrieve the veto definer file and save it locally

Parameters: cp (ConfigParser instance) – out_dir (path) – tags (list of strings) – Used to retrieve subsections of the ini file for configuration options.
pycbc.workflow.segment.setup_segment_gen_mixed(workflow, veto_categories, out_dir, maxVetoAtRunTime, tag=None, generate_coincident_segs=True)[source]

This function will generate veto files for each ifo and for each veto category. It can generate these vetoes at run-time or in the workflow (or do some at run-time and some in the workflow). However, the CAT_1 vetoes and science time must be generated at run time as they are needed to plan the workflow. CATs 2 and higher may be needed for other workflow construction. It can also combine these files to create a set of cumulative, multi-detector veto files, which can be used in ligolw_thinca and in pipedown. Again these can be created at run time or within the workflow.

Parameters: workflow (pycbc.workflow.core.Workflow) – The Workflow instance that the coincidence jobs will be added to. This instance also contains the ifos for which to attempt to obtain segments for this analysis and the start and end times to search for segments over. veto_categories (list of ints) – List of veto categories to generate segments for. If this stops being integers, this can be changed here. out_dir (path) – The directory in which output will be stored. maxVetoAtRunTime (int) – Generate veto files at run time up to this category. Veto categories beyond this in veto_categories will be generated in the workflow. If we move to a model where veto categories are not explicitly cumulative, this will be rethought. tag (string, optional (default=None)) – Use this to specify a tag. This can be used if this module is being called more than once to give call specific configuration (by setting options in [workflow-datafind-${TAG}] rather than [workflow-datafind]). This is also used to tag the Files returned by the class to uniqueify the Files and uniqueify the actual filename. FIXME: Filenames may not be unique with current codes! generate_coincident_segs (boolean, optional (default = True)) – If given this module will generate a set of coincident, cumulative veto files that can be used with ligolw_thinca and pipedown. segFilesList – These are representations of the various segment files that were constructed at this stage of the workflow and may be needed at later stages of the analysis (e.g. for performing DQ vetoes). If the file was generated at run-time the segment lists contained within these files will be an attribute of the instance. (If it will be generated in the workflow it will not be because I am not psychic). dictionary of pycbc.workflow.core.SegFile instances pycbc.workflow.segment.setup_segment_generation(workflow, out_dir, tag=None)[source] This function is the gateway for setting up the segment generation steps in a workflow. It is designed to be able to support multiple ways of obtaining these segments and to combine/edit such files as necessary for analysis. The current modules have the capability to generate files at runtime or to generate files that are not needed for workflow generation within the workflow. Parameters: workflow (pycbc.workflow.core.Workflow) – The workflow instance that the coincidence jobs will be added to. This instance also contains the ifos for which to attempt to obtain segments for this analysis and the start and end times to search for segments over. out_dir (path) – The directory in which output will be stored. tag (string, optional (default=None)) – Use this to specify a tag. This can be used if this module is being called more than once to give call specific configuration (by setting options in [workflow-datafind-${TAG}] rather than [workflow-datafind]). This is also used to tag the Files returned by the class to uniqueify the Files and uniqueify the actual filename. FIXME: Filenames may not be unique with current codes! segsToAnalyse (dictionay of ifo-keyed glue.segment.segmentlist instances) – This will contain the times that your code should analyse. By default this is science time - CAT_1 vetoes. (This default could be changed if desired) segFilesList (pycbc.workflow.core.FileList of SegFile instances) – These are representations of the various segment files that were constructed at this stage of the workflow and may be needed at later stages of the analysis (e.g. for performing DQ vetoes). If the file was generated at run-time the segment lists contained within these files will be an attribute of the instance. (If it will be generated in the workflow it will not be because I am not psychic).