# Running Workflows

You can run workflows from the command-line using the command [`dx run`](https://documentation.dnanexus.com/helpstrings-of-sdk-command-line-utilities#run). The inputs to these workflows can be from any project for which you have *VIEW* access.

The examples here use the publicly available [Exome Analysis Workflow](https://platform.dnanexus.com/projects/BQfgzV80bZ46kf6pBGy00J38/data/) (platform login required to access this link).

For information on how to run a Nextflow pipeline, see [Running Nextflow Pipelines](https://documentation.dnanexus.com/user/running-apps-and-workflows/running-nextflow-pipelines).

## Running in Interactive Mode

Running `dx run` without specifying an input launches interactive mode. The system prompts for each required input, followed by options to select from a list of optional parameters to modify. Optional parameters include all modifiable parameters for each stage of the workflow. The interface outputs a JSON file detailing the input specified and generates an analysis ID of the form `analysis-xxxx` unique to this particular run of the workflow.

Below is an example of running the Exome Analysis Workflow from the public "Exome Analysis Demo" project.

```shell
$ dx run "Exome Analysis Demo:Exome Analysis Workflow"
Entering interactive mode for input selection.

Input:   Reads (bwa_mem_fastq_read_mapper.reads_fastqgzs)
Class:   array:file

Enter file values, one at a time (^D or <ENTER> to finish, <TAB> twice for compatible files in
    current directory, '?' for more options)
bwa_mem_fastq_read_mapper.reads_fastqgzs[0]: "Exome Analysis Demo:/Input/SRR504516_1.fastq.gz"


Select an optional parameter to set by its # (^D or <ENTER> to finish):

 [0] Reads (right mates) (bwa_mem_fastq_read_mapper.reads2_fastqgzs)
 [1] Read group information (bwa_mem_fastq_read_mapper.rg_info_csv)
.
.
.
 [33] Output prefix (gatk4_genotypegvcfs.prefix)
 [34] Extra command line options (gatk4_genotypegvcfs.extra_options) [default="-G StandardAnnotation --only-output-calls-starting-in-intervals"]

Optional param #: 0

Input:   Reads (right mates) (bwa_mem_fastq_read_mapper.reads2_fastqgzs)
Class:   array:file

Enter file values, one at a time (^D or <ENTER> to finish, <TAB> twice for compatible files in
   current directory, '?' for more options)
bwa_mem_fastq_read_mapper.reads2_fastqgzs[0]: "Exome Analysis Demo:/Input/SRR504516_2.fastq.gz"
bwa_mem_fastq_read_mapper.reads2_fastqgzs[1]:

Optional param #: <ENTER>

Using input JSON:
{
  "bwa_mem_fastq_read_mapper.reads_fastqgzs": [
    {
      "$dnanexus_link": {
        "project": "project-BQfgzV80bZ46kf6pBGy00J38",
        "id": "file-B40jg7v8KfPy38kjz1vQ001y"
      }
    }
  ],
  "bwa_mem_fastq_read_mapper.reads2_fastqgzs": [
    {
      "$dnanexus_link": {
        "project": "project-BQfgzV80bZ46kf6pBGy00J38",
        "id": "file-B40jgYG8KfPy38kjz1vQ0020"
      }
    }
  ]
}

Confirm running the executable with this input [Y/n]: <ENTER>
Calling workflow-xxxx with output destination project-xxxx:/

Analysis ID: analysis-xxxx
```

## Running in Non-Interactive Mode

You can specify each input on the command-line using the `-i` or `--input` flags using the syntax `-i<stage ID>.<input name>=<input value>`. `<input-value>` must take the form of a DNAnexus object ID or a file named in the project you have selected. It is also possible to specify the number of a stage in place of the stage ID for a given workflow, where stages are indexed starting at zero. The inputs in the following example are specified for the first stage of the workflow only to illustrate this point. The parentheses around the `<input-value>` in the help string are omitted when entering input.

Possible values for the input name field can be found by running the command `dx run workflow-xxxx -h`, as shown below using the Exome Analysis Workflow.

```shell
$ dx run "Exome Analysis Demo:Exome Analysis Workflow" -h
usage: dx run Exome Analysis Demo:Exome Analysis Workflow [-iINPUT_NAME=VALUE ...]

Workflow: GATK4 Exome FASTQ to VCF (hs38DH)

Runs GATK4 Best Practice for Exome on hs38DH reference genome

Inputs:
 bwa_mem_fastq_read_mapper
  Reads: -ibwa_mem_fastq_read_mapper.reads_fastqgzs=(file) [-ibwa_mem_fastq_read_mapper.reads_fastqgzs=... [...]]
        An array of files, in gzipped FASTQ format, with the first read mates
        to be mapped.

  Reads (right mates): [-ibwa_mem_fastq_read_mapper.reads2_fastqgzs=(file) [-ibwa_mem_fastq_read_mapper.reads2_fastqgzs=... [...]]]
        (Optional) An array of files, in gzipped FASTQ format, with the second
        read mates to be mapped.
  BWA reference genome index: [-ibwa_mem_fastq_read_mapper.genomeindex_targz=(file, default={"$dnanexus_link": {"project": "project-BQpp3Y804Y0xbyG4GJPQ01xv", "id": "file-FFJPKp0034KY8f20F6V9yYkk"}})]
        A file, in gzipped tar archive format, with the reference genome
        sequence already indexed with BWA.
  ...
 fastqc
  Reads: [-ifastqc.reads=(file, default={"$dnanexus_link": {"stage": "bwa_mem_fastq_read_mapper", "outputField": "sorted_bam"}})]
        A file containing the reads to be checked. Accepted formats are
        gzipped-FASTQ and BAM.
  ...
 gatk4_bqsr
  Sorted mappings: [-igatk4_bqsr.mappings_sorted_bam=(file, default={"$dnanexus_link": {"outputField": "sorted_bam", "stage": "bwa_mem_fastq_read_mapper"}})]
        A coordinate-sorted BAM or CRAM file with the base quality scores to
        be recalibrated.
   ...
 ...

Outputs:
  Sorted mappings: bwa_mem_fastq_read_mapper.sorted_bam (file)
        A coordinate-sorted BAM file with the resulting mappings.

  Sorted mappings index: bwa_mem_fastq_read_mapper.sorted_bai (file)
        The associated BAM index file.
  ...
  Variants index: gatk4_genotypegvcfs.variants_vcfgztbi (file)
        The associated TBI file.
```

This help message describes the inputs for each stage of the workflow in the order they are specified. For each stage of the workflow, the help message first lists the required inputs for that stage, specifying the requisite type in the `<input-value>` field. Next, the message describes common options for that stage (as seen in that stage's corresponding UI on the platform). Lastly, it lists advanced command-line options for that stage. If any stage's input is linked to the output of a prior stage, the help message shows the default value for that stage as a DNAnexus link of the form

`{"$dnanexus_link": {"outputField": "<prior stage output name>", "stage": "stage-xxxx" }}`.

This link format can also be used to specify output from any prior stage in the workflow as input for the current stage.

For the Exome Analysis Workflow, one required input parameter needs to be specified manually: `-ibwa_mem_fastq_read_mapper.reads_fastqgzs`.

This parameter targets the first stage of the workflow. For convenience, use the stage number instead of the full stage ID. Since this is the first stage (and workflow stages are zero-indexed), replace `bwa_mem_fastq_read_mapper` with `0` like this: `-i0.reads_fastqgzs`.

The example below shows how to run the same Exome Analysis Workflow on a FASTQ file containing reads, as well as a BWA reference genome, using the default parameters for each subsequent stage.

```shell
$ dx run "Exome Analysis Demo:Exome Analysis Workflow" \
 -i0.reads_fastqgzs="Exome Analysis Demo:/Input/SRR504516_1.fastq.gz" \
 -ibwa_mem_fastq_read_mapper.genomeindex_targz='Reference Genome Files\: AWS US (East):/H. Sapiens - GRCh37 - hs37d5 (1000 Genomes Phase II)/hs37d5.bwa-index.tar.gz' -y
Using input JSON:
{
  "bwa_mem_fastq_read_mapper.reads_fastqgzs": [
    {
      "$dnanexus_link": {
        "project": "project-BQfgzV80bZ46kf6pBGy00J38",
        "id": "file-B40jg7v8KfPy38kjz1vQ001y"
      }
    }
  ],
  "bwa_mem_fastq_read_mapper.genomeindex_targz": {
    "$dnanexus_link": {
      "project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
      "id": "file-B6ZY4942J35xX095VZyQBk0v"
    }
  }
}

Calling workflow-xxxx with output destination
  project-xxxx:/

Analysis ID: analysis-xxxx
```

### Specifying Array Input

Array input can be specified by specifying multiple inputs for a single parameter in a stage. For example, the following flags would add files 1 through 3 to the `file_inputs` parameter for `stage-xxxx` of the `workflow`:

```shell
$ dx run workflow \
-istage-xxxx.file_inputs=project-xxxx:file-1xxxx \
-istage-xxxx.file_inputs=project-xxxx:file-2xxxx \
-istage-xxxx.file_inputs=project-xxxx:file-3xxxx

Using input JSON:
{
  "stage-xxxx.file_inputs": [
      {
       "$dnanexus_link": {
          "project": "project-xxxx",
          "id": "file-1xxxx"
      },
      {
       "$dnanexus_link": {
          "project": "project-xxxx",
          "id": "file-2xxxx"
      },
      {
       "$dnanexus_link": {
          "project": "project-xxxx",
          "id": "file-3xxxx"
      }
  ]
}
```

If no project is selected, or if the file is in another project, the project containing the files you wish to use must be specified as follows: `-i<stage ID>.<input name>=<project id>:<file id>`.

### Job-Based Object References (JBORs)

The `-i` flag can also be used to specify [job-based object references](https://documentation.dnanexus.com/developer/api/running-analyses/job-input-and-output#job-based-object-references) (JBORs) with the syntax `-i<stage ID or number>:<input name>=<job id>:<output name>`. The `--brief` flag, when used with the command `dx run`, outputs only the execution's ID. You can also skip the interactive prompts confirming the execution using the `-y` flag. Calling `dx run` on a job with the `--brief` flag returns only the job ID of that execution, and you can skip being prompted to begin execution with the `-y` flag.

The example below calls the [BWA-MEM FASTQ Read Mapper](https://platform.dnanexus.com/app/bwa_mem_fastq_read_mapper) app (platform login required to access this link) to produce the `sorted_bam` output described in the help string produced by running `dx run app-bwa_mem_fastq_read_mapper -h`. This output is then used as input to the first stage of the [Parliament Workflow](https://platform.dnanexus.com/projects/BZqK8kQ0Jz1ZQ8KKpbKbqVjv/workflows/BZv70QQ0Jz1VQvpykGX71Gy0) featured on the DNAnexus Platform (platform login required to access this link).

```shell
$ dx run Parliament \
  -i0.illumina_bam=$(dx run bwa_mem_fastq_read_mapper -ireads_fastqgzs=file-xxxx -ireads2_fastqgzs=file-xxxx -igenomeindex_targz=project-BQpp3Y804Y0xbyG4GJPQ01xv:file-B6qq53v2J35Qyg04XxG0000V -y --brief):sorted_bam \
  -i0.ref_fasta=project-BQpp3Y804Y0xbyG4GJPQ01xv:file-B6qq53v2J35Qyg04XxG0000V \
  -y

Using input JSON:
{
    "stage-F14F5qQ0Jz1gfpjX8y1JxG3y.illumina_bam": {
        "$dnanexus_link": {
            "field": "sorted_bam",
            "job": "job-xxxx"
        }
    },
    "stage-F14F5qQ0Jz1gfpjX8y1JxG3y.ref_fasta": {
        "$dnanexus_link": {
            "project": "project-xxxx",
            "id": "file-B6qq53v2J35Qyg04XxG0000V"
        }
    }
}

Calling workflow-xxxx with output destination project-xxxx:/

Analysis ID: analysis-xxxx
```

## Advanced Options

### Quiet Output

Using the `--brief` flag at the end of a `dx run` command causes the command line to print the execution's analysis ID ("analysis-xxxx") instead of the input JSON for the execution. This ID can be saved for later reference.

```shell
$ dx run workflow-xxxx -i0.input_file=Input/SRR504516_1.fastq.gz -y --brief
analysis-xxxx
```

### Rerunning Analyses With Modified Settings

To modify specific settings from the previous analysis, you can run the command `dx run --clone analysis-xxxx [options]`. The `[options]` parameters override anything set by the `--clone` flag, and take the form of options passed as input from the command line.

The `--clone` flag does not copy the usage of the `--allow-ssh` or `--debug-on` flags, which must be set with the new execution. Only the applet, instance type, and input spec are copied. See the [Connecting to Jobs](https://documentation.dnanexus.com/developer/apps/execution-environment/connecting-to-jobs) page for more information on the usage of these flags.

For example, the command below redirects the output of the analysis to the `outputs/` folder and reruns all stages.

```shell
dx run --clone analysis-xxxx \
  --rerun-stage "*" \
  --destination project-xxxx:/output -y
```

{% hint style="info" %}
Only the outputs of stages rerun are placed in the destination specified.
{% endhint %}

### Rerunning Specific Stages

When rerunning workflows, if a stage runs identically to how it ran in a previous analysis, the stage itself is not rerun. The outputs of that stage are not copied or rewritten in a new location. To rerun a specific stage, use the option `--rerun-stage STAGE_ID` to force a stage to be run again, where `STAGE_ID` is an ID of the form `stage-xxxx`, the stage's name, or the index of that stage (where the first stage of a workflow is indexed at 0). If you want to rerun all stages of an analysis, you can use `--rerun-stage "*"`, where the asterisk is enclosed in quotes to prevent expansion of that variable into all folders in your current directory via globbing.

The command below reruns the third and final stage of analysis-xxxx

```shell
dx run --clone analysis-xxxx --rerun-stage 2 --brief -y
```

### Specifying Analysis Output Folders

The `--destination` flag allows you to specify the path of the output of a workflow. By default, every output of every stage is written to the destination specified.

#### Specifying Output Folders

You can use the `--stage-output-folder <stage_ID> <folder>` command to specify the output destination of a particular stage in the analysis being run. In this command, `stage_ID` is the stage's name, or the index of that stage (where the first stage of a workflow is indexed at 0). The `folder` is the project and path to which you wish the stage to write using the syntax `project-xxxx:/PATH` where `PATH` is the path to the folder in `project-xxxx` where you wish to write outputs.

The following command reruns all stages of `analysis-xxxx` and sets the output destination of the first step of the workflow (BWA) to "mappings" in the current project:

```shell
dx run --clone analysis-xxxx --rerun-stage "*" \
  --stage-output-folder 0 "mappings" --brief -y
```

#### Specifying Stage-Relative Output Folders

If you want to specify output folder of a stage within the current output folder of the entire analysis, you can use the flag `--stage-relative-output-folder <stage_id> <folder>`, where `stage_id` is the stage's name (`stage-xxxx`), or the index of that stage (where the first stage of a workflow is indexed at 0). For the folder argument, you can specify a quoted path to write the output of that stage that is relative to the output folder of the analysis.

The following command reruns all stages of `analysis-xxxx`, setting the output destination of the analysis to `/exome_run`, and the output destination of stage 0 to `/exome_run/mappings` in the current project:

```shell
dx run --clone analysis-xxxx --rerun-stage "*" \
  --destination "exome_run" \
  --stage-relative-output-folder 0 "mappings" --brief -y
```

### Specifying a Different Instance Type

To specify the instance type of all stages in your analysis or a specific set of stages in your analysis, use the flag `--instance-type`. Specifically, the format `--instance-type STAGE_ID=INSTANCE_TYPE` allows you to set the instance type of a specific stage, while `--instance-type INSTANCE_TYPE` sets one instance type for all stages. The two options can be combined, for example, `--instance-type mem2_ssd1_x2 --instance-type my_stage_0=mem3_ssd1_x16` sets all stages' instance types to `mem2_ssd1_x2` except for the stage `my_stage_0`, for which `mem3_ssd1_x16` is used.

Here STAGE\_ID is an ID of a stage, the stage's name, or the index of that stage (where the first stage of a workflow is indexed at 0).

The example below reruns all stages of `analysis-xxxx` and specifies that the first and second stages should be run on `mem1_ssd2_x8` and `mem1_ssd2_x16` instances respectively:

```shell
dx run --clone analysis-xxxx \
  --rerun-stage "*" \
  --instance-type '{"0": "mem1_hdd2_x8", "1": "mem1_ssd2_x4"}' \
  --brief -y
```

### Adding Metadata to an Analysis

This is identical to adding metadata to a job. See [Adding metadata to a job](https://documentation.dnanexus.com/user/running-apps-and-applets#adding-metadata-to-a-job) for details.

### Monitoring an Analysis

Command line monitoring of an analysis is not available. For information about monitoring a job from the command line, see [Monitoring Executions](https://documentation.dnanexus.com/user/running-apps-and-workflows/monitoring-executions).

{% hint style="warning" %}
On the DNAnexus Platform, jobs are limited to a runtime of 30 days. Jobs that run longer than 30 days are automatically terminated.
{% endhint %}

### Providing Input JSON

This is identical to providing an input JSON to a job. For more information, see [Providing input JSON](https://documentation.dnanexus.com/user/running-apps-and-applets#providing-input-json).

As in running a workflow in non-interactive mode, inputs to a workflow must be specified as `STAGE_ID.<input>`. Here STAGE\_ID is either an ID of the form `stage-xxxx` or the index of that stage in the workflow (starting with the first stage at index 0).
