Running Workflows
You can run workflows from the command-line using the command dx run
. The inputs to these workflows can be from any project for which you have VIEW access.
The examples here use the publicly available Exome Analysis Workflow (platform login required to access this link).
For information on how to run a Nextflow pipeline, see here.
Running in Interactive Mode
If dx run
is run without specifying an input, interactive mode will be launched. You will then be prompted to enter each required input, after which you will be given the option to select from a list of optional parameters to modify. Optional parameters listed will include all those that can be modified for each stage of the workflow. The interface will then output a JSON file detailing the input specified and generate an analysis ID of the form analysis-xxxx
unique to this particular run of the workflow.
Below is an example of running the Exome Analysis Workflow from the public "Exome Analysis Demo" project.
Running in Non-Interactive Mode
You can specify each input on the command-line using the -i
or --input
flags using the syntax -i<stage ID>.<input name>=<input value>
. <input-value>
must take the form of a DNAnexus object ID or a file named in the project currently selected. It is also possible to specify the number of a stage in place of the stage ID for a given workflow, where stages are indexed starting at zero. The inputs in the following example are specified for the first stage of the workflow only to illustrate this point. Note that the parentheses around the <input-value>
in the help string are omitted when entering input.
Possible values for the input name field can be found by running the command dx run workflow-xxxx -h
, as shown below using the Exome Analysis Workflow.
This help message describes the inputs for each stage of the workflow in the order they are specified. For each stage of the workflow, the help message will first list the required inputs for that stage, specifying the requisite type in the <input-value>
field. Next, the message describes common options for that stage (as seen in that stage's corresponding UI on the platform). Lastly, it will list advanced command-line options for that stage. If any stage's input is linked to the output of a prior stage, the help message shows the default value for that stage as a DNAnexus link of the form
{"$dnanexus_link": {"outputField": "<prior stage output name>", "stage": "stage-xxxx" }}
.
Similarly, this link format can be used to specify output from any prior stage in the workflow as input for the current stage. We see that the Exome Analysis Workflow has one required file array input in addition to those already specified by default: -ibwa_mem_fastq_read_mapper.reads_fastqgzs
. As these inputs are for the first stage of the Exome Analysis Workflow, the bwa_mem_fastq_read_mapper
stage ID can be replaced with 0
.
Workflow stages are zero-indexed; the first stage of a workflow is denoted as stage 0.
The example below shows how to run the same Exome Analysis Workflow on a FASTQ file containing reads, as well as a BWA reference genome, using the default parameters for each subsequent stage.
Specifying Array Input
Array input can be specified by specifying multiple inputs for a single parameter in a stage. For example, the following flags would add files 1 through 3 to the file_inputs
parameter for stage-xxxx
of the workflow
:
If no project is selected, or if the file is in another project, the project containing the files you wish to use must be specified as follows: -i<stage ID>.<input name>=<project id>:<file id>
.
Job-Based Object References (JBORs)
The -i
flag can also be used to specify job-based object references (JBORs) with the syntax -i<stage ID or number>:<input name>=<job id>:<output name>
. The --brief
flag, when used with the command dx run
, will only output the execution's ID; we can also skip the interactive prompts confirming the execution using the -y
flag. Calling dx run
on a job with the --brief
flag allows the command to return just the job ID of that execution and we can skip being prompted to begin execution with the -y
flag.
The example below calls the BWA-MEM FASTQ Read Mapper app (platform login required to access this link) to produce the sorted_bam
output described in the help string produced by running dx run app-bwa_mem_fastq_read_mapper -h
. This output is then used as input to the first stage of the Parliament Workflow featured on the DNAnexus platform (platform login required to access this link).
Advanced Options
Quiet Output
Using the --brief
flag at the end of a dx run
command will cause the command line to print the execution's analysis ID ("analysis-xxxx") instead of the input JSON for the execution. This ID can be saved for later reference.
Rerunning Analyses With Modified Settings
To modify specific settings from the previous analysis, you can run the command dx run --clone analysis-xxxx [options]
. The [options]
parameters will override anything set by the --clone
flag, and they take the form of options passed as input from the command line.
Note that the --clone
flag will not copy the usage of the --allow-ssh
or --debug-on
flags, which must be set with the new execution; only the applet, instance type, and input spec are copied. See the Connecting to Jobs page for more information on the usage of these flags.
For example, the command below redirects the output of the analysis to the outputs/
folder and reruns all stages.
Only the outputs of stages rerun are placed in the destination specified.
Rerunning Specific Stages
When rerunning workflows, if a stage is run identically to how it was run in a previous analysis, the stage itself will not be rerun; the outputs of that stage will not be copied or rewritten in a new location. To rerun a specific stage, use the option --rerun-stage STAGE_ID
to force a stage to be run again, wherein STAGE_ID is an ID of the form stage-xxxx, the stage's name, or the index of that stage (where the first stage of a workflow is indexed at 0). If you wish to rerun all stages of an analysis, you can use --rerun-stage "*"
, where the asterisk is enclosed in quotes to prevent expansion of that variable into all folders your current directory via globbing.
The command below reruns the third and final stage of analysis-xxxx
Specifying Analysis Output Folders
The --destination
flag allows you to specify the path of the output of a workflow. Every output of every stage will be written to the destination specified by default.
Specifying Output Folders
You can use the --stage-output-folder <stage_ID> <folder>
command to specify the output destination of a particular stage in the analysis being run, wherein stage_ID
is the stage's name, or the index of that stage (where the first stage of a workflow is indexed at 0) andfolder
is the project and path to which you wish the stage to write using the syntax project-xxxx:/PATH
where PATH
is the path to the folder in project-xxxx
where you wish to write outputs.
The following command reruns all stages of analysis-xxxx
and sets the output destination of the first step of the workflow (BWA) to "mappings" in the current project:
Specifying Stage-Relative Output Folders
If you want to specify output folder of a stage within the current output folder of the entire analysis, you can use the flag --stage-relative-output-folder <stage_id> <folder>
, wherestage_id
is the stage's name (stage-xxxx
), or the index of that stage (where the first stage of a workflow is indexed at 0). For the folder argument, you can specify a quoted path to write the output of that stage that is relative to the output folder of the analysis.
The following command reruns all stages of analysis-xxxx
, setting the output destination of the analysis to /exome_run,
and the output destination of stage 0 to /exome_run/mappings
in the current project:
Specifying a Different Instance Type
If you wish to specify the instance type of all stages in your analysis or a specific set of stages in your analysis, you can do so with the flag --instance-type
. Specifically, the format --instance-type STAGE_ID=INSTANCE_TYPE
allows us to set the instance type of a specific stage, while --instance-type INSTANCE_TYPE
sets one instance types for all of the stages. The two options can be combined, for example --instance-type mem2_ssd1_x2 --instance-type my_stage_0=mem3_ssd1_x16
will set all stages' instance types to mem2_ssd1_x2
except for the stage my_stage_0
for which mem3_ssd1_x16
will be used.
Here STAGE_ID is an ID of a stage, the stage's name, or the index of that stage (where the first stage of a workflow is indexed at 0).
The example below reruns all stages of analysis-xxxx
and specifies that the first and second stages should be run on mem1_ssd2_x8
and mem1_ssd2_x16
instances respectively:
Adding Metadata to an Analysis
This is identical to adding metadata to a job; see Adding metadata to a job for details.
Monitoring an Analysis
It is not possible to monitor an analysis to a command line. For information about monitoring a job from the command line, see Monitoring Executions.
On the DNAnexus Platform, jobs are limited to a runtime of 30 days. Jobs running longer than 30 days will be automatically terminated.
Providing Input JSON
This is identical to providing an input JSON to a job; for more information, see Providing input JSON.
Note that as in running a workflow in non-interactive mode, inputs to a workflow must be specified as STAGE_ID.<input>
, where STAGE_ID is either an ID of the form stage-xxxx
or the index of that stage in the workflow (starting with the first stage at index 0).
Last updated