Workflow
This documentation provides a comprehensive guide to the template structure necessary for implementing Workflow objects. These objects enable users to codify pipeline steps and store metadata to track inputs, outputs, software, and description files (e.g., WDL or CWL) for each workflow.
Template
## Workflow information #####################################
# General information for the workflow
#############################################################
# All the following fields are required
name: <string>
description: <string>
runner:
language: <language> # cwl, wdl
main: <file> # .cwl or .wdl file
child:
- <file> # .cwl or .wdl file
category:
- <string> # Annotation
# All the following fields are optional and provided as example,
# can be expanded to anything accepted by the schema
# https://github.com/smaht-dac/smaht-portal/tree/main/src/encoded/schemas
software:
- <software>@<version|commit>
## Input information ########################################
# Input files and parameters
#############################################################
input:
# File argument
<file_argument_name>:
argument_type: file.<format> # bam, fastq, bwt, ...
# Parameter argument
<parameter_argument_name>:
argument_type: parameter.<type> # string, integer, float, array, boolean, object
## Output information #######################################
# Output files and quality controls
#############################################################
output:
# File output
<file_output_name>:
argument_type: file.<format>
secondary_files:
- <format> # bam, fastq, bwt, ...
# QC output
<qc_output_name>:
argument_type: qc
argument_to_be_attached_to: <file_output_name>
# Fields to specify the output type
# either json or zipped folder
json: <boolean>
zipped: <boolean>
# Report output
<report_output_name>:
argument_type: report
General Fields Definition
Required
All the following fields are required.
name
Name of the workflow, MUST BE GLOBALLY UNIQUE (ACROSS THE PORTAL OBJECTS).
description
Description of the workflow.
runner
Definition of the data processing flow for the workflow. This field is used to specify the standard language and description files used to define the workflow. Several subfields need to be specified:
language [required]: Language standard used for workflow description
main [required]: Main description file
child [optional]: List of supplementary description files used by main
At the moment we support two standards, Common Workflow Language (CWL) and Workflow Description Language (WDL).
input
Description of input files and parameters for the workflow. See Input Definition.
output
Description of expected outputs for the workflow. See Output Definition.
Input Definition
Each argument is defined by its name. Additional subfields need to be specified depending on the argument type.
argument_type
Definition of the type of the argument.
For a file argument, the argument type is defined as file.<format>
, where <format>
is the format used by the file.
<format>
needs to match a file format that has been previously defined, see File Format.
For a parameter argument, the argument type is defined as parameter.<type>
, where <type>
is the type of the value expected for the argument [string, integer, float, array, boolean, object].
Output Definition
Each output is defined by its name. Additional subfields need to be specified depending on the output type.
argument_type
Definition of the type of the output.
For a file output, the argument type is defined as file.<format>
, where <format>
is the format used by the file.
<format>
needs to match a file format that has been previously defined, see File Format.
For a report output, the argument type is defined as report
.
For a QC (Quality Control) output, the argument type is defined as qc
.
For a QC, it is possible to generate two different types of output: a key-value pairs JSON file and a compressed file.
The JSON file can be used to create a summary report of the quality metrics generated by the QC process.
The compressed file can be used to store the original output for the QC, including additional data or graphs.
Both the JSON file and compressed file will be attached to the file specified as target by argument_to_be_attached_to
with a QualityMetric
object.
The content of the JSON file will be patched directly on the object, while the compressed file will be made available for download via a link.
The output type can be specified by setting json: True
or zipped: True
in the the QC output definition.
Template for key-value pairs JSON:
}
"name": "Quality metric name",
"qc_values": [
{
"key": "Name of the key",
"tooltip": "Tooltip for the key",
"value": "Value for the key"
}
]
}
secondary_files
This field can be used for output files.
List of <format>
for secondary files associated to the output file.
Each <format>
needs to match a file format that has been previously defined, see File Format.
argument_to_be_attached_to
This field can be used for output QCs.
Name of the output file the QC is calculated for.