DNAnexus Documentation
APIDownloadsIndex of dx CommandsLegal
  • Overview
  • Getting Started
    • DNAnexus Essentials
    • Key Concepts
      • Projects
      • Organizations
      • Apps and Workflows
    • User Interface Quickstart
    • Command Line Quickstart
    • Developer Quickstart
    • Developer Tutorials
      • Bash
        • Bash Helpers
        • Distributed by Chr (sh)
        • Distributed by Region (sh)
        • SAMtools count
        • TensorBoard Example Web App
        • Git Dependency
        • Mkfifo and dx cat
        • Parallel by Region (sh)
        • Parallel xargs by Chr
        • Precompiled Binary
        • R Shiny Example Web App
      • Python
        • Dash Example Web App
        • Distributed by Region (py)
        • Parallel by Chr (py)
        • Parallel by Region (py)
        • Pysam
      • Web App(let) Tutorials
        • Dash Example Web App
        • TensorBoard Example Web App
      • Concurrent Computing Tutorials
        • Distributed
          • Distributed by Region (sh)
          • Distributed by Chr (sh)
          • Distributed by Region (py)
        • Parallel
          • Parallel by Chr (py)
          • Parallel by Region (py)
          • Parallel by Region (sh)
          • Parallel xargs by Chr
  • User
    • Login and Logout
    • Projects
      • Project Navigation
      • Path Resolution
    • Running Apps and Workflows
      • Running Apps and Applets
      • Running Workflows
      • Running Nextflow Pipelines
      • Running Batch Jobs
      • Monitoring Executions
      • Job Notifications
      • Job Lifecycle
      • Executions and Time Limits
      • Executions and Cost and Spending Limits
      • Smart Reuse (Job Reuse)
      • Apps and Workflows Glossary
      • Tools List
    • Cohort Browser
      • Chart Types
        • Row Chart
        • Histogram
        • Box Plot
        • List View
        • Grouped Box Plot
        • Stacked Row Chart
        • Scatter Plot
        • Kaplan-Meier Survival Curve
      • Locus Details Page
    • Using DXJupyterLab
      • DXJupyterLab Quickstart
      • Running DXJupyterLab
        • FreeSurfer in DXJupyterLab
      • Spark Cluster-Enabled DXJupyterLab
        • Exploring and Querying Datasets
      • Stata in DXJupyterLab
      • Running Older Versions of DXJupyterLab
      • DXJupyterLab Reference
    • Using Spark
      • Apollo Apps
      • Connect to Thrift
      • Example Applications
        • CSV Loader
        • SQL Runner
        • VCF Loader
      • VCF Preprocessing
    • Environment Variables
    • Objects
      • Describing Data Objects
      • Searching Data Objects
      • Visualizing Data
      • Filtering Objects and Jobs
      • Archiving Files
      • Relational Database Clusters
      • Symlinks
      • Uploading and Downloading Files
        • Small File Sets
          • dx upload
          • dx download
        • Batch
          • Upload Agent
          • Download Agent
    • Platform IDs
    • Organization Member Guide
    • Index of dx commands
  • Developer
    • Developing Portable Pipelines
      • dxCompiler
    • Cloud Workstation
    • Apps
      • Introduction to Building Apps
      • App Build Process
      • Advanced Applet Tutorial
      • Bash Apps
      • Python Apps
      • Spark Apps
        • Table Exporter
        • DX Spark Submit Utility
      • HTTPS Apps
        • Isolated Browsing for HTTPS Apps
      • Transitioning from Applets to Apps
      • Third Party and Community Apps
        • Community App Guidelines
        • Third Party App Style Guide
        • Third Party App Publishing Checklist
      • App Metadata
      • App Permissions
      • App Execution Environment
        • Connecting to Jobs
      • Dependency Management
        • Asset Build Process
        • Docker Images
        • Python package installation in Ubuntu 24.04 AEE
      • Job Identity Tokens for Access to Clouds and Third-Party Services
      • Enabling Web Application Users to Log In with DNAnexus Credentials
      • Types of Errors
    • Workflows
      • Importing Workflows
      • Introduction to Building Workflows
      • Building and Running Workflows
      • Workflow Build Process
      • Versioning and Publishing Global Workflows
      • Workflow Metadata
    • Ingesting Data
      • Molecular Expression Assay Loader
        • Common Errors
        • Example Usage
        • Example Input
      • Data Model Loader
        • Data Ingestion Key Steps
        • Ingestion Data Types
        • Data Files Used by the Data Model Loader
        • Troubleshooting
      • Dataset Extender
        • Using Dataset Extender
    • Dataset Management
      • Rebase Cohorts and Dashboards
      • Assay Dataset Merger
      • Clinical Dataset Merger
    • Apollo Datasets
      • Dataset Versions
      • Cohorts
    • Creating Custom Viewers
    • Client Libraries
      • Support for Python 3
    • Walkthroughs
      • Creating a Mixed Phenotypic Assay Dataset
      • Guide for Ingesting a Simple Four Table Dataset
    • DNAnexus API
      • Entity IDs
      • Protocols
      • Authentication
      • Regions
      • Nonces
      • Users
      • Organizations
      • OIDC Clients
      • Data Containers
        • Folders and Deletion
        • Cloning
        • Project API Methods
        • Project Permissions and Sharing
      • Data Object Lifecycle
        • Types
        • Object Details
        • Visibility
      • Data Object Metadata
        • Name
        • Properties
        • Tags
      • Data Object Classes
        • Records
        • Files
        • Databases
        • Drives
        • DBClusters
      • Running Analyses
        • I/O and Run Specifications
        • Instance Types
        • Job Input and Output
        • Applets and Entry Points
        • Apps
        • Workflows and Analyses
        • Global Workflows
        • Containers for Execution
      • Search
      • System Methods
      • Directory of API Methods
      • DNAnexus Service Limits
  • Administrator
    • Billing
    • Org Management
    • Single Sign-On
    • Audit Trail
    • Integrating with External Services
    • Portal Setup
    • GxP
      • Controlled Tool Access (allowed executables)
  • Science Corner
    • Scientific Guides
      • Somatic Small Variant and CNV Discovery Workflow Walkthrough
      • SAIGE GWAS Walkthrough
      • LocusZoom DNAnexus App
      • Human Reference Genomes
    • Using Hail to Analyze Genomic Data
    • Open-Source Tools by DNAnexus Scientists
    • Using IGV Locally with DNAnexus
  • Downloads
  • FAQs
    • EOL Documentation
      • Python 3 Support and Python 2 End of Life (EOL)
    • Automating Analysis Workflow
    • Backups of Customer Data
    • Developing Apps and Applets
    • Importing Data
    • Platform Uptime
    • Legal and Compliance
    • Sharing and Collaboration
    • Product Version Numbering
  • Release Notes
  • Technical Support
  • Legal
Powered by GitBook

Copyright 2025 DNAnexus

On this page
  • dxCompiler Setup
  • WDL Example
  • CWL Example
  • Preprocessing a CWL Workflow
  • Limitations
  • Help String

Was this helpful?

Export as PDF
  1. Developer
  2. Developing Portable Pipelines

dxCompiler

An introduction to using dxCompiler, a tool for compiling WDL and CWL workflows on the DNAnexus platform

Last updated 2 years ago

Was this helpful?

dxCompiler is a tool for compiling pipelines written in the and the to equivalent workflows on the DNAnexus platform.

Below we introduce a few use cases for dxCompiler. This tool can be downloaded from the dxCompiler Github , which also contains more including DNAnexus extensions and integration with private docker repositories.

dxCompiler Setup

For information on how to set up dxCompiler, see instructions on the .

WDL Example

Validating a Workflow

dxCompiler uses , a parser that adheres strictly to the WDL specifications. Most of the problematic automatic type conversions that are allowed by some other WDL runtime engines are not allowed by dxCompiler. Please use the command line tools in wdlTools (e.g. check and lint) to validate your WDL files before trying to compile them with dxCompiler.

Compiling and Running a Workflow

The bam_chrom_counter workflow is written in WDL. Task slice_bam splits a bam file into an array of sub-files. Task count_bam counts the number of alignments on a bam file. The workflow takes an input bam file, calls slice_bam to split it into chromosomes, and calls count_bam in parallel on each chromosome. The results comprise a bam index file, and an array with the number of reads per chromosome.

bam_chrom_counter.wdl
version 1.0

workflow bam_chrom_counter {
    input {
        File bam
    }

    call slice_bam {
        input : bam = bam
    }
    scatter (slice in slice_bam.slices) {
        call count_bam {
            input: bam = slice
        }
    }
    output {
        File bai = slice_bam.bai
        Array[Int] count = count_bam.count
    }
}

task slice_bam {
    input {
        File bam
        Int num_chrom = 22
    }
    command <<<
    set -ex
    samtools index ~{bam}
    mkdir slices/
    for i in `seq ~{num_chrom}`; do
        samtools view -b ~{bam} -o slices/$i.bam $i
    done
    >>>
    runtime {
        docker: "quay.io/biocontainers/samtools:1.12--hd5e65b6_0"
    }
    output {
        File bai = "~{bam}.bai"
        Array[File] slices = glob("slices/*.bam")
    }
}

task count_bam {
    input {
        File bam
    }

    command <<<
        samtools view -c ~{bam}
    >>>
    runtime {
        docker: "quay.io/biocontainers/samtools:1.12--hd5e65b6_0"
    }
    output {
        Int count = read_int(stdout())
    }
}

From the command line, we can compile the workflow to the DNAnexus platform using the dxCompiler jar file.

$ java -jar dxCompiler.jar compile bam_chrom_counter.wdl \
    -project project-xxxx -folder /my/workflows/

This compiles the source WDL file to several platform objects in the specified DNAnexus project project-xxxx under folder /my/workflows/

  • A workflow bam_chrom_counter

  • Two applets that can be called independently: slice_bam, and count_bam

  • A few auxiliary applets that process workflow inputs, outputs, and launch the scatter.

You can review the structure of the compiled DNAnexus workflow using the describe dxCompiler subcommand. The output shows the generated DNAnexus workflows and applets in a tree that describes a caller/callee relationship.

$ java -jar dxCompiler.jar describe workflow-G3P0jFQ0Fgk3Z7855fqKyjPy --pretty
Workflow: bam_chrom_counter
├───App Inputs: common
├───App Task: slice_bam
├───App Fragment: scatter (slice in slice_bam.slices)
│   └───App Task: count_bam
└───App Outputs: outputs
dx run bam_chrom_counter \
  -istage-common.bam=project-BQbJpBj0bvygyQxgQ1800Jkk:file-FpQKQk00FgkGV3Vb3jJ8xqGV
$ java -jar dxCompiler.jar compile bam_chrom_counter.wdl \
   -project project-xxxx -folder /my/workflows/ \
   -inputs bam_chrom_counter_input.json
$ dx run bam_chrom_counter -f bam_chrom_counter_input.dx.json
$ dx find executions -n1
* bam_chrom_counter (done) analysis-G3P0kGj0Fgk8ZY0645QYZ9bB
│ demo-user 2022-03-01 11:42:59
├── * outputs (bam_chrom_counter_outputs:main) (done) job-G3P0kkQ0Fgk8ZY0645QYZ9bX
│     demo-user 2022-03-01 11:43:44 (runtime 0:00:42)
├── * scatter (slice in slice_bam.slices) (bam_chrom_counter_frag_stage-1:main) (done) job-G3P0kj00Fgk8ZY0645QYZ9bV
│   │ demo-user 2022-03-01 11:43:38 (runtime 0:00:46)
│   ├── collect_scatter (bam_chrom_counter_frag_stage-1:collect) (done) job-G3P1Gb80Fgk8gx406ppxVZX6
│   │   demo-user 2022-03-01 12:24:05 (runtime 0:00:42)
│   ├── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1Gb80Fgk3ZjYx714yFZyQ
│   │   demo-user 2022-03-01 12:24:05 (runtime 0:01:31)
│   ├── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1Gb00FgkBb45v7197QVb2
│   │   demo-user 2022-03-01 12:24:04 (runtime 0:01:34)
...
│   ├── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1GZQ0Fgk0QBKp6pVZgzFg
│   │   demo-user 2022-03-01 12:24:01 (runtime 0:01:33)
│   ├── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1GZ80Fgk9vFj26Vg599P4
│   │   demo-user 2022-03-01 12:24:01 (runtime 0:01:33)
│   └── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1GZ80Fgk6VJ5v6kGYp5k9
│       demo-user 2022-03-01 12:24:01 (runtime 0:01:53)
├── * slice_bam (done) job-G3P0kfQ0Fgk8ZY0645QYZ9bQ
│     demo-user 2022-03-01 11:43:31 (runtime 0:37:14)
└── * common (bam_chrom_counter_common:main) (done) job-G3P0kZj0Fgk8ZY0645QYZ9bP
      demo-user 2022-03-01 11:43:25 (runtime 0:00:45)

The snapshot below shows what you will see from the UI when the workflow execution is completed:

CWL Example

Preprocessing a CWL Workflow

dxCompiler requires the source CWL file to be "packed" as a cwl.json file, which contains a single compound workflow with all the dependent processes included. Additionally, you may need to upgrade the version of your workflow to CWL v1.2.

We'll use the bam_chrom_counter CWL workflow similar to the WDL example above to illustrate upgrading, packing and running a CWL workflow. This workflow is written in CWL v1.0 and the top-level Workflow in bam_chrom_counter.cwl calls the two CommandLineTools in slice_bam.cwl and count_bam.cwl.

bam_chrom_counter.cwl
cwlVersion: v1.0
id: bam_chrom_counter
class: Workflow
requirements:
- class: ScatterFeatureRequirement
inputs:
- id: bam
  type: File
  # upload this local file to the platform and replace the path below with the DNAnexus URI "dx://project-xxx:file-yyyy"
  default: "path/to/my/input_bam"
outputs:
- id: bai
  type: File
  outputSource: slice_bam/bai
- id: count
  type: int[]
  outputSource: count_bam/count
steps:
- id: slice_bam
  run: slice_bam.cwl
  in:
    bam: bam
  out: [bai, slices]
- id: count_bam
  run: count_bam.cwl
  scatter: bam
  in:
    bam: slice_bam/slices
  out: [count]
slice_bam.cwl
cwlVersion: v1.0
id: slice_bam
class: CommandLineTool
inputs:
- id: bam
  type: File
- id: num_chrom
  default: 22
  type: int
outputs:
- id: bai
  type: File
  outputBinding:
    glob: $(inputs.bam.basename).bai
- id: slices
  type: File[]
  outputBinding:
    glob: "slices/*.bam"
requirements:
- class: InlineJavascriptRequirement
- class: ShellCommandRequirement
- class: DockerRequirement
  dockerPull: "quay.io/biocontainers/samtools:1.12--hd5e65b6_0"
- class: InitialWorkDirRequirement
  listing:
  - entryname: slice_bam.sh
    entry: |-
      set -ex
      samtools index $1
      mkdir slices/
      for i in `seq $2`; do
          samtools view -b $1 -o slices/$i.bam $i
      done
  - entry: $(inputs.bam)
baseCommand: ["sh", "slice_bam.sh"]
arguments:
  - position: 0
    valueFrom: $(inputs.bam.basename)
  - position: 1
    valueFrom: $(inputs.num_chrom)
hints:
- class: NetworkAccess
  networkAccess: true
- class: LoadListingRequirement
  loadListing: deep_listing
count_bam.cwl
cwlVersion: v1.0
id: count_bam
class: CommandLineTool
requirements:
- class: InlineJavascriptRequirement
- class: ShellCommandRequirement
- class: DockerRequirement
  dockerPull: "quay.io/biocontainers/samtools:1.12--hd5e65b6_0"
inputs:
- id: bam
  type: File
  inputBinding:
    position: 1
baseCommand: ["samtools", "view", "-c"]
outputs:
- id: count
  type: int
  outputBinding:
    glob: stdout
    loadContents: true
    outputEval: "$(parseInt(self[0].contents))"
stdout: stdout
hints:
- class: NetworkAccess
  networkAccess: true
- class: LoadListingRequirement
  loadListing: deep_listingcount

Before compilation, follow the steps below to preprocess these CWL files:

  1. Install cwl-upgrader and upgrade the CWL files to v1.2 (needed in this case as CWL files are in CWL v1.0):

    $ pip3 install cwl-upgrader
    
    # upgrade all dependent CWL files, which will be saved in the current working directory
    $ cd contrib/beginner_example
    $ cwl-upgrader cwl_v1.0/bam_chrom_counter.cwl cwl_v1.0/slice_bam.cwl cwl_v1.0/count_bam.cwl
  2. $ pip3 install sbpack
    $ cwlpack --add-ids --json bam_chrom_counter.cwl > bam_chrom_counter.cwl.json

Validating a Workflow

Compiling and Running a Workflow

Once it is upgraded and packed as suggested above, we can compile it as a DNAnexus workflow and run it.

$ java -jar dxCompiler.jar compile bam_chrom_counter.cwl.json -project project-xxxx -folder /my/workflows/
$ dx run bam_chrom_counter -istage-common.bam=project-BQbJpBj0bvygyQxgQ1800Jkk:file-FpQKQk00FgkGV3Vb3jJ8xqGV

Limitations

  • WDL and CWL

    • All task and workflow names must be unique across the entire import tree

      • For example, if A.wdl imports B.wdl and A.wdl defines workflow foo, then B.wdl cannot have a workflow or task named foo

    • Subworkflows built from higher-level workflows are not intended to be used on their own

  • WDL only

    • Workflows with forward references (i.e. a variable referenced before it is declared) are not yet supported

    • The call ... after syntax introduced in WDL 1.1 is not yet supported

  • CWL only

    • Calling native DNAnexus apps/applets in CWL workflow using dxni is not supported.

    • SoftwareRequirement and InplaceUpdateRequirement are not yet supported

    • Publishing a dxCompiler-generated workflow as a global workflow is not supported

    • Applet and job reuse is not supported

Help String

dxCompiler's capabilities are outlined in the dxCompiler's help string below.

$ java -jar dxCompiler.jar version
2.10.1
$ java -jar dxCompiler.jar help

java -jar dxCompiler.jar <action> <parameters> [options]

Actions:
  version
    Prints the dxCompiler version.

  config
    Prints the current dxCompiler configuration.

  describe <DxWorkflow ID>
    Generate the JSON execution tree for a given DNAnexus workflow ID.
    The workflow needs to be have been previously compiled by dxCompiler.
    options
      -pretty                Print exec tree in "pretty" text format instead of JSON.

  compile <WDL or CWL file>
    Compile a WDL or CWL file to a DNAnexus workflow or applet.
    options
      -archive               Archive older versions of applets.
      -compileMode [IR, All]
                             Compilation mode - If not specified, the compilation
                             mode is "All" and the compiler will translate WDL or CWL
                             inputs into DNAnexus workflows and tasks.
                             Use "IR" if you only want to parse CWL or WDL files and
                             convert standard-formatted input files to DNAnexus JSON
                             input format without performing full compilation.
      -defaults <string>     JSON file with standard-formatted default values.
      -defaultInstanceType <string>
                             The default instance type to use for "helper" applets
                             that perform runtime evaluation of instance type
                             requirements. This instance type is also used when
                             the '-instanceTypeSelection dynamic' option is set.
                             This value is overriden by any defaults set in set in the
                             JSON file specified by '-extras'.
      -destination <string>  Full platform path (project:/folder).
      -execTree [json,pretty]
                             Print a JSON representation of the workflow.
      -extras <string>       JSON file with extra options (see documentation).
      -inputs <string>       JSON file with standard-formatted input values. May be
                             specified multiple times. A DNAnexus JSON input file is
                             generated for each standard input file.
      -instanceTypeSelection [static,dynamic]
                             Whether to select instance types at compile time for tasks with
                             runtime requirements that can all be statically evaluated
                             (the default "static" option), or to defer instance type
                             selection in such cases to runtime (the "dynamic" option).
                             Using static instance type selection can save time, but it
                             requires the same set of instances to be accessible during WDL/CWL
                             compilation and during the runtime of the generated applets and
                             workflows. Use the "dynamic" option if you plan on creating global
                             DNAnexus workflows or cloning the generated workflows between
                             DNAnexus organizations with different available instance types.
      -locked                Create a locked workflow. When running a locked workflow,
                             input values may only be specified for the top-level workflow.
      -leaveWorkflowsOpen    Leave created workflows open (otherwise they are closed).
      -p | -imports <string> Directory to search for imported WDL or CWL files. May be specified
                             multiple times.
      -projectWideReuse      Look for existing applets/workflows in the entire project
                             before generating new ones. The default search scope is the
                             target folder only.
      -reorg                 Reorganize workflow output files.
      -runtimeDebugLevel [0,1,2]
                             How much debug information to write to the job log at runtime.
                             Log the minimum (0), intermediate (1, the default), or all
                             debug information (2, for internal debugging).
      -separateOutputs       Store the output files of each call in a separate folder. The
                             default behavior is to put all outputs in the same folder.
      -streamFiles [all,none,perfile]
                             Whether to mount all files with dxfuse (do not use the
                             download agent), to mount no files with dxfuse (only use
                             download agent), or to respect the per-file settings in WDL
                             parameter_meta sections (default).
      -useManifests          Use manifest files for all workflow and applet inputs and
                             outputs. Implies -locked.
      -waitOnUpload          Whether to wait for each file upload to complete.

  dxni (WDL only)
    DNAnexus Native call Interface. Creates stubs for calling DNAnexus executables
    (apps/applets/workflows), and stores them as WDL tasks in a local file. Enables
    calling existing platform executables without modification.
    options:
      -apps [include,exclude,only]
                             Option 'include' includes both apps and applets, 'exclude'
                             excludes apps and generates applet stubs only, 'only'
                             generates app stubs only.
      -f | force             Delete any existing output file.
      -o <path>              Destination file for WDL task definitions (defaults to
                             stdout).
      -path <string>         Name of a specific app or a path to a specific applet.
      -r | recursive         Search recursively for applets in the target folder.

Common options
    -folder <string>         Platform folder (defaults to '/').
    -project <string>        Platform project (defaults to currently selected project).
    -language <string> [ver] Which language to use? May be WDL or CWL. You can optionally
                             specify a version. Currently: i. WDL: draft-2, 1.0, and 1.1, and
                             ii. CWL: 1.2 are supported and WDL development is partially
                             supported. The default is to auto-detect the language from the
                             source file.
    -quiet                   Do not print warnings or informational outputs.
    -verbose                 Print detailed logging.
    -verboseKey <module>     Print verbose output only for a specific module. May be
                             specified multiple times.
    -logFile <path>          File to use for logging output; defaults to stderr.


The generated workflow can be executed from the web UI (see instructions ) or via the DNAnexus command-line client. For example, to run the workflow with the input bam file project-BQbJpBj0bvygyQxgQ1800Jkk:file-FpQKQk00FgkGV3Vb3jJ8xqGV, use the following command:

Alternatively, you can also convert a into a DNAnexus format when compiling the workflow. Then you can pass the DNAnexus input file to dx run using -f option as described in detail .

After launching the workflow analysis, you can monitor it on the CLI following or from the UI as suggested . CLI sessions shows the executed workflow:

De-localize all local paths referenced in the CWL: if the CWL specifies a local path, e.g. a schema or a default value for a file-type input (like the default path "path/to/my/input_bam" for input bam in ), you need to upload this file to a DNAnexus project and then replace the local path in the CWL with its full DNAnexus URI, e.g. dx://project-XXX:file-YYY.

Install sbpack package and run the cwlpack command on the top-level workflow file to build a single packed file containing the top level workflow and all the steps it depends on:

dxCompiler compiles tools/workflows written according to the . You can use cwltool --validate to validate the packed CWL file you want to compile.

Calls with missing arguments have

The that has been deprecated since WDL draft2 is not supported

A detailed description of the advanced dxCompiler features can be found in the public dxCompiler github repository .

Workflow Description Language (WDL)
Common Workflow Language (CWL)
repository
advanced options
dxCompiler Github page
wdlTools
here
Cromwell JSON format
input file
here
these instructions
here
bam_chrom_counter.cwl
bam_chrom_counter.cwl.json
CWL v1.2 standard
limited support
alternative workflow output syntax
here