Workflow Metadata
Use workflow metadata to allow the dx build command to build a workflow according to your specifications.
The file dxworkflow.json
is a DNAnexus workflow metadata file. If a dxworkflow.json
file is detected in the directory provided to dx build
, the toolkit attempts to build a workflow on the Platform according to the workflow specifications in the JSON file.
The format of the file closely resembles that of the corresponding calls to /workflow/new.
The next section shows a detailed example of the fields used in the file.
Annotated Example
The following example lists the contents of a sample dxworkflow.json
that can be provided in a directory for use with the dx build
command.
Comments shown below are for reference only and are not valid in the JSON format.
{
// (optional for regular, project-based workflows; required for global workflows)
// Workflow name
"name": "exome_variant_calling",
// (optional) Title of a workflow, used in display, search, or listing in the UI or CLI
"title": "Exome Variant Calling",
// (optional for regular, project-based workflows; required for global workflows)
// Version of the global workflow
"version": "1.0.0",
// (optional) A short description of the workflow
"summary": "A simple exome pipeline",
// (optional) Folder for the workflow's output
"outputFolder": "/output",
// (global workflow only) Specify a resource container that to be accessible
// to all apps/applets run by the workflow. Requires all apps/applets to be
// compiled with the "allProjects": VIEW permission.
"regionalOptions": {
"aws:us-east-1": {
"resources": "project-xxxx"
}
},
// (optional) Workflow level input specification (see API documentation)
"inputs": [
{
// Name of the workflow-level input
"name": "reads",
// Class of the workflow-level input
"class": "array:file",
// (optional) help for this workflow-level input
"help": "An array of FASTQ gzipped files"
}
],
// (optional) Workflow level output specification (see API documentation)
"outputs": [
{
// Name of the workflow-level output
"name": "variants",
// Class of the workflow-level output
"class": "file",
// Link to the output of the stage which provides the output of the workflow
"outputSource": {
"$dnanexus_link": {
"stage": "call_variants",
"outputField": "variants_vcfgz"
}
}
}
],
// (optional) A list of stages
"stages": [
{
// Unique ID of the first stage
"id": "align_reads",
// (optional) Display name of the first stage
"name": "BWA MEM",
// Name or ID of the app or ID of the applet run in this stage
"executable": "app-bwa_mem_fastq_read_mapper/2.0.4",
// The output folder to which outputs of this stage should be cloned.
// Folder paths can be absolute ("/foo/bar") or relative ("foo/bar") to
// the workflow's outputFolder.
"folder": "map_reads_output",
// (optional) Input of the first stage
"input": {
// Input field name
"genomeindex_targz": {
// Link to a reference genome file
"$dnanexus_link": {
"project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
"id": "file-B6ZY4942J35xX095VZyQBk0v"
}
},
// Input field name
"reads_fastqgzs": {
// Link to the workflow level input; the input passed to "reads" on
// the workflow level will be consumed by the "reads_fastqgz" input
"$dnanexus_link": {
"workflowInputField": "reads"
}
}
},
// (optional) Request different instance types for different entry points
// of this stage
"systemRequirements": {
// "main" is the name of the entry point called when a stage is run
"main": {
"instanceType": "mem1_ssd1_v2_x16"
}
},
// (optional) Options governing job restart policy
"executionPolicy": {
// Restart automatically up to 3 times for all errors
"restartOn": {
"*": 3
}
}
},
{
// Unique ID of the second stage
"id": "call_variants",
// (optional) Display name of the second stage
"name": "Freebayes",
// Name or ID of the app/globalworkflow or ID of the applet/workflow
// run in this stage
"executable": "app-freebayes/2.0.1",
// The output folder to which outputs of this stage should be cloned.
// Folder paths can be absolute ("/foo/bar") or relative ("foo/bar") to
// the workflow's outputFolder.
"folder": "call_variants_output",
// (optional) Input of the second stage which is linked to the output of
// "sorted_bam" of the first stage
"input": {
"sorted_bams": [
{
"$dnanexus_link": {
"stage": "align_reads",
"outputField": "sorted_bam"
}
}
],
"genome_fastagz": {
"$dnanexus_link": {
"project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
"id": "file-B6ZY7VG2J35Vfvpkj8y0KZ01"
}
}
}
}
]
}
Other options for the /workflow/new
call, such as specifying in which project or folder to create a workflow, are populated via command-line flags of dx build
.
Specification
name
name
string The name of the workflow. If it is not provided, the auto-generated workflow ID is used. When a global workflow is built (with dx build --globalworkflow
) the name is required and more strict formatting rules apply: the name can have lower case letters, numbers, "-" , "." , and "_" but cannot have spaces.
Example:
{
"name": "exome_variant_calling",
}
title
title
string. The title of the workflow. It is a label displayed to the users in the Web Interface. If it is not provided, the name of the workflow is used.
Example:
{
"title": "Exome Variant Calling",
}
version
version
string (Global workflows only). The version of the workflow. This version must be unique from all other versions of the global workflow (published or not).
We recommend following the Semantic Versioning conventions for numbering the versions of your global workflow. Semantic Versioning specifies how you should change the version number for specific types of updates to your global workflow. This includes bug-fix only updates, backwards compatible changes, or backwards incompatible modifications. Using the Semantic Versioning guidelines helps users and other developers understand when it is safe to move between different versions of your global workflow.
Example:
{
"version": "1.0.0",
}
summary
summary
string. A short description of the workflow.
Example:
{
"summary": "A simple exome pipeline",
}
outputFolder
outputFolder
string (optional). The default output folder for the workflow.
Example:
{
"outputFolder": "/output",
}
inputs
inputs
array of mappings (optional). JSON array containing the specifications for each input to the workflow.
Example:
[
{
"name": "reads",
"class": "file",
"default": {"$dnanexus_link": "file-xxxx"}
}
]
outputs
outputs
array of mappings (optional). JSON array containing the specifications for each output of the workflow. The specification is the same as the output specification of an app(let) with an addition of the outputSource
field, which allows the workflow developer to link specific stage outputs to workflow outputs.
Example:
[
{
"name": "variants",
"class": "file",
"outputSource": {"$dnanexus_link": {
"stage": "stage_id",
"outputField": "executable_output_fieldname"
}
}
}
]
stages
stages
string (optional). A list of stages to add to the workflow. See the stages
input field of the /workflow/new call for a detailed specification.
Example:
{
"stages": [
{
"id": "align_reads",
"name": "BWA MEM",
"executable": "app-bwa_mem_fastq_read_mapper/2.0.4",
"folder": "map_reads_output",
"input": {
"genomeindex_targz": {
"$dnanexus_link": {
"project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
"id": "file-B6ZY4942J35xX095VZyQBk0v"
}
},
"reads_fastqgzs": {
"$dnanexus_link": {
"workflowInputField": "reads"
}
}
},
"systemRequirements": {
"main": {
"instanceType": "mem1_ssd1_v2_x16"
}
},
"executionPolicy": {
"restartOn": {
"*": 3
}
}
},
{
"id": "call_variants",
"name": "Freebayes",
"executable": "app-freebayes/2.0.1",
"folder": "call_variants_output",
"input": {
"sorted_bams": [{
"$dnanexus_link": {
"stage": "align_reads",
"outputField": "sorted_bam"
}
}],
"genome_fastagz": {
"$dnanexus_link":{
"project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
"id": "file-B6ZY7VG2J35Vfvpkj8y0KZ01"
}
}
}
}
]
}
regionalOptions
regionalOptions
You can specify what regions the workflow can be run in, and configure region-specific settings that control the workflow's behavior across multiple regions. The regionalOptions
field should be a mapping with keys corresponding to each region in which the workflow should be runnable. A region is given by a string such as aws:us-east-1
. If you don't specify regionalOptions
, the workflow is enabled in only one region: the region of your project context when the workflow is built.
The values associated with each key are themselves mappings that configure the workflow's behavior in the corresponding region. Each value of regionalOptions
may contain the following keys:
workflow
string (required) ID of the underlying workflow of this global workflow in the corresponding region. This must be a regular workflow stored as a data object in any project. The I/O specifications of all specified underlying workflows must be identical across regions.resources
string or array of strings (optional) Either a string containing the ID of a project that is made available as a resources container, or an array of data object IDs that are all cloned into the root folder of the resources container. All specified objects must exist in the specified region and are accessible to the workflow when it runs in that region. If you specifyresources
for any region inregionalOptions
, you must specifyresources
for every region listed inregionalOptions
.
Last updated
Was this helpful?