> For the complete documentation index, see [llms.txt](https://documentation.dnanexus.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://documentation.dnanexus.com/developer/workflows/intro-to-building-workflows.md).

# Introduction to Building Workflows

Creating a workflow is easiest via the [web interface](/developer/workflows/building-and-running-workflows.md), but you can use the [DNAnexus SDK](/downloads.md), `dx-toolkit`, if you want to automate workflow creation or lock down your workflow. This tutorial provides step-by-step instructions from a local workstation.

For information on building Nextflow workflows, see [Running Nextflow Pipelines](/user/running-apps-and-workflows/running-nextflow-pipelines.md).

## Basic Workflows

A workflow can be created on the DNAnexus Platform from a [`dxworkflow.json`](/developer/workflows/workflow-metadata.md) file.

This tutorial builds a workflow named "BWA MEM + FreeBayes Exome Workflow". The `stages` field of the JSON file holds a list of executables for the workflow. The example includes two stages: the first runs the app BWA-MEM FASTQ Read Mapper and the second runs FreeBayes Variant Caller. The JSON also specifies a name and an output folder for results. The example `dxworkflow.json` looks as follows:

{% hint style="info" %}
The `dxworkflow.json` file in this example contains two separate fields: `sorted_bams` and `sorted_bam`. The `sorted_bams` input field for the FreeBayes app is bound to the `sorted_bam` field of the BWA step.
{% endhint %}

```json
{
  "name": "BWA MEM + FreeBayes Exome Workflow",
  "outputFolder": "/results",
  "stages": [
    {
      "id": "align_reads",
      "executable": "app-bwa_mem_fastq_read_mapper",
      "input": {
        "genomeindex_targz": {
          "$dnanexus_link": {
            "project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
            "id": "file-B6ZY4942J35xX095VZyQBk0v"
          }
        }
      }
    },
    {
      "id": "call_variants",
      "executable": "app-freebayes",
      "input": {
        "sorted_bams": [{
          "$dnanexus_link": {
            "stage": "align_reads",
            "outputField": "sorted_bam"
          }
        }],
        "genome_fastagz": {
          "$dnanexus_link":{
            "project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
            "id": "file-B6ZY7VG2J35Vfvpkj8y0KZ01"
          }
        }
      }
    }
  ]
}
```

Each stage in the `stages` list must include an `id` (a free-form string unique in the workflow) and an `executable` field that contains the ID or name of an app or an ID of an applet to run in that stage.

Add an `input` field for a stage to [bind](/developer/api/running-analyses/workflows-and-analyses.md#binding-input) the stage input to an output or input of another stage. For example, the file array input `sorted_bams` of the second stage, `call_variants`, receives values from the output field `sorted_bam` of the first stage, `align_reads`:

```json
{
  "input": {
    "sorted_bams": [{
      "$dnanexus_link": {
        "stage": "align_reads",
        "outputField": "sorted_bam"
      }
   }]
  }
}
```

{% hint style="info" %}
Input and output field names are defined by the apps or applets they belong to. For apps, find these field names in the app documentation available in the online interface under the **Tools Library**.

To view the names of an executable's input and output fields, run the [`dx describe`](/user/helpstrings-of-sdk-command-line-utilities.md#describe) command.
{% endhint %}

Use the `input` section of a stage to set default values for a field. The example selects the file `hs37d5.bwa-index.tar.gz` (`file-B6ZY4942J35xX095VZyQBk0v`), which is publicly available in the reference project "Apps Data: AWS US (East)" (`project-BQpp3Y804Y0xbyG4GJPQ01xv`), as the default reference file for the alignment step, `align_reads`.

```json
"input": {
  "genomeindex_targz": {
    "$dnanexus_link": {
      "project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
      "id": "file-B6ZY4942J35xX095VZyQBk0v"
    }
  }
}
```

## Creating a Workflow on the DNAnexus Platform

Create a workflow object on the DNAnexus Platform with the following steps:

1. On the local workstation, create a directory named "BWA MEM + FreeBayes Exome Workflow". The directory name does not need to match the workflow name exactly, but matching them is a good practice.
2. Place the `dxworkflow.json` file in the new directory.
3. Create the workflow on the DNAnexus Platform by navigating to the directory and entering the following commands:

```shell
$ ls "BWA MEM + FreeBayes Exome Workflow"
dxworkflow.json
$ dx build "BWA MEM + FreeBayes Exome Workflow"
```

After `dx build` finishes, it shows you the ID of the resulting workflow. You can also view this workflow by logging in to your DNAnexus account on the Platform and viewing the workflow from your project's **Manage** page.

To run a workflow, pass values for any stage inputs or override them:

```shell
dx run -i align_reads.reads_fastqgzs=myreads.fastq.gz \
  -i align_reads.genomeindex_targz=file-xxxx \
  "BWA MEM + FreeBayes Exome Workflow"
```

## Locked Workflows

### Reasons to Lock a Workflow

Sometimes, you may want to prevent users from changing certain stage inputs in a workflow. For example, you might want to ensure that only a specific reference genome is used, and restrict users from modifying the reference genome input.

To achieve that, you can add workflow-level `inputs` and `outputs` fields during creation, with links to stage inputs and outputs. When the workflow runs, users can pass values only to fields defined in `inputs`. All the parameters that are not visible in this workflow-level I/O interface cannot be changed.

Creating locked workflows is useful to simplify the workflow execution and make it clear which inputs users are expected to provide.

This approach also improves execution of WDL workflows on the platform because WDL workflows explicitly specify workflow inputs and outputs.

### Building a Locked-Down Workflow

The example shows a locked-down version named "BWA MEM + FreeBayes Exome Workflow (locked)". All inputs are locked except `reads_fastqgzs` in the `align_reads` stage. When locking workflows, list the inputs that are *not* locked in the workflow-level `inputs` field. All other inputs become locked and users cannot override them at runtime.

#### Inputs

To create a locked workflow, add a workflow-level input specification in the `inputs` field, for example:

```json
{
  "inputs": [
    {
      "name": "reads",
      "help": "An array of files, in gzipped FASTQ format.",
      "class": "array:file",
      "patterns": [ "*.fq.gz", "*.fastq.gz" ]
    }
  ]
}
```

In this case the workflow has only one input, named `reads`.

#### Stages

Next, define which stage or stages consume that input by adding a link from those stages to the workflow input using the `workflowInputField` field, as in the following example. If a file is supplied to `reads` at runtime, it is directed to `reads_fastqgzs` in the `align_reads` stage.

```json
{
  "stages": [
    {
      "id": "align_reads",
      "name": "BWA MEM",
      "executable": "app-bwa_mem_fastq_read_mapper",
      "input": {
        "reads_fastqgzs": {
          "$dnanexus_link": {
            "workflowInputField": "reads"
          }
        },
        "genomeindex_targz": {
          "$dnanexus_link": {
            "project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
            "id": "file-B6ZY4942J35xX095VZyQBk0v"
          }
        }
      }
    },
    {
      "id": "call_variants",
      "name": "FreeBayes",
      "executable": "app-freebayes",
      "folder": "call_variants_output",
      "input": {
        "sorted_bams": [{
          "$dnanexus_link": {
            "stage": "align_reads",
            "outputField": "sorted_bam"
          }
        }],
        "genome_fastagz": {
          "$dnanexus_link":{
            "project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
            "id": "file-B6ZY7VG2J35Vfvpkj8y0KZ01"
          }
        }
      }
    }
  ]
}
```

Notice that the input fields `genomeindex_targz` and `genome_fastagz` are not included in the workflow-level `inputs`, indicating that these fields are locked. Because users cannot supply values for locked fields, set these values in each stage's `input` field (not the workflow-level `inputs`). The workflow then runs with the values `file-B6ZY4942J35xX095VZyQBk0v` and `file-B6ZY7VG2J35Vfvpkj8y0KZ01` respectively.

#### Required Inputs

Any required stage inputs in a locked workflow must be specified in the `dxworkflow.json`. In the example, the stages have the following required inputs:

* The `align_reads` stage has the inputs `reads_fastqgzs` and `genomeindex_targz`.
* The `call_variants` stage has the inputs `sorted_bams` and `genome_fastagz`.

The `reads_fastqgzs` input is created as a workflow-level input in `inputs`. This input is not locked and users supply its value. The remaining inputs are locked. The workflow creator must set values for locked inputs. In the example workflow definition, those values are set in the `inputs` block.

If the workflow-level `inputs` specification is null or not specified at all, the workflow can accept inputs provided directly to the workflow stages by the user.

Multiple stages can also link to the same workflow-level input.

#### Outputs

Optionally, specify workflow-level `outputs`:

```json
{
  "outputs": [
    {
      "name": "variants",
      "class": "file",
      "outputSource": {
        "$dnanexus_link": {
          "stage": "call_variants",
          "outputField": "variants_vcfgz"
        }
      }
    }
  ]
}
```

The `outputSource` field configures which stage-level outputs become workflow outputs. Together with `inputs`, this is useful when setting a workflow as an executable within another workflow.

#### Full JSON Description of a Locked-Down Workflow

The example `dxworkflow.json` description looks as follows:

```json
{
  "name": "BWA MEM + FreeBayes Exome Workflow (locked)",
  "outputFolder": "/results",
  "inputs": [
    {
      "name": "reads",
      "label": "Reads",
      "help": "An array of files, in gzipped FASTQ format.",
      "class": "array:file",
      "patterns": [
        "*.fq.gz",
        "*.fastq.gz"
      ]
    }
  ],
  "stages": [
    {
      "id": "align_reads",
      "name": "BWA MEM",
      "executable": "app-bwa_mem_fastq_read_mapper",
      "input": {
        "reads_fastqgzs": {
          "$dnanexus_link": {
            "workflowInputField": "reads"
          }
        },
        "genomeindex_targz": {
          "$dnanexus_link": {
            "project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
            "id": "file-B6ZY4942J35xX095VZyQBk0v"
          }
        }
      }
    },
    {
      "id": "call_variants",
      "name": "FreeBayes",
      "executable": "app-freebayes",
      "folder": "call_variants_output",
      "input": {
        "sorted_bams": [{
          "$dnanexus_link": {
            "stage": "align_reads",
            "outputField": "sorted_bam"
          }
        }],
        "genome_fastagz": {
          "$dnanexus_link":{
            "project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
            "id": "file-B6ZY7VG2J35Vfvpkj8y0KZ01"
          }
        }
      }
    }
  ],
  "outputs": [
    {
      "name": "variants",
      "class": "file",
      "outputSource": {
        "$dnanexus_link": {
          "stage": "call_variants",
          "outputField": "variants_vcfgz"
        }
      }
    }
  ]
}
```

Build the workflow by running this command in the directory "BWA MEM + FreeBayes Exome Workflow (locked)" (which contains the `dxworkflow.json`):

```shell
dx build "BWA MEM + FreeBayes Exome Workflow (locked)"
```

### Running a Locked-Down Workflow via the CLI

To run the workflow, pass a FASTQ input file to the workflow-level `reads` input field:

```shell
dx run "BWA MEM + FreeBayes Exome Workflow (locked)" \
  -i reads="Exome Analysis Demo":/Input/SRR504516_1.fastq.gz
```

Providing the input file directly to the stage, for example, `-ialign_reads.reads_fastqgzs=my_input_file.fastq.gz`, is impossible for locked workflows.

To find out how to run the workflow and what inputs it accepts, run this command:

```shell
dx run "BWA MEM + FreeBayes Exome Workflow (locked)" --help
```

### Running a Locked-Down Workflow via the UI

Locked workflows in the UI resemble running an app, with separate input and output panels.

Locked workflows cannot be edited or created in the UI. You can build locked workflows in the CLI by using [`dx get`](/user/helpstrings-of-sdk-command-line-utilities.md#get) and then [`dx build`](/user/helpstrings-of-sdk-command-line-utilities.md#build).

### Locking Down an Existing Workflow

To lock down an existing workflow, run `dx get "BWA MEM + FreeBayes Exome Workflow"`, add `inputs` to the downloaded `dxworkflow.json`, set `workflowInputField` references from stages to these `inputs`, and run `dx build` again.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.dnanexus.com/developer/workflows/intro-to-building-workflows.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
