dxCompiler

An introduction to using dxCompiler, a tool for compiling WDL and CWL workflows on the DNAnexus Platform

dxCompiler is a tool for compiling pipelines written in the Workflow Description Language (WDL) and the Common Workflow Language (CWL) to equivalent workflows on the DNAnexus Platform.

The following sections introduce a few use cases for dxCompiler. This tool can be downloaded from the dxCompiler GitHub repository, which also contains more advanced options including DNAnexus extensions and integration with private Docker repositories.

dxCompiler Setup

For information on how to set up dxCompiler, see the instructions for dxCompiler setup on GitHub.

WDL Example

Validating a Workflow

dxCompiler uses wdlTools, a parser that adheres strictly to the WDL specifications. Most of the problematic automatic type conversions that are allowed by other WDL runtime engines are not allowed by dxCompiler. Use the command line tools in wdlTools to validate your WDL files before trying to compile them with dxCompiler. Tools like check and lint are especially useful for this validation.

Compiling and Running a Workflow

The bam_chrom_counter workflow is written in WDL. Task slice_bam splits a BAM file into an array of sub-files. Task count_bam counts the number of alignments on a BAM file. The workflow takes an input BAM file, calls slice_bam to split it into chromosomes, and calls count_bam in parallel on each chromosome. The results form a BAM index file, and an array with the number of reads per chromosome.

bam_chrom_counter.wdl
version 1.0

workflow bam_chrom_counter {
    input {
        File bam
    }

    call slice_bam {
        input : bam = bam
    }
    scatter (slice in slice_bam.slices) {
        call count_bam {
            input: bam = slice
        }
    }
    output {
        File bai = slice_bam.bai
        Array[Int] count = count_bam.count
    }
}

task slice_bam {
    input {
        File bam
        Int num_chrom = 22
    }
    command <<<
    set -ex
    samtools index ~{bam}
    mkdir slices/
    for i in `seq ~{num_chrom}`; do
        samtools view -b ~{bam} -o slices/$i.bam $i
    done
    >>>
    runtime {
        docker: "quay.io/biocontainers/samtools:1.12--hd5e65b6_0"
    }
    output {
        File bai = "~{bam}.bai"
        Array[File] slices = glob("slices/*.bam")
    }
}

task count_bam {
    input {
        File bam
    }

    command <<<
        samtools view -c ~{bam}
    >>>
    runtime {
        docker: "quay.io/biocontainers/samtools:1.12--hd5e65b6_0"
    }
    output {
        Int count = read_int(stdout())
    }
}

From the command line, compile the workflow to the DNAnexus Platform using the dxCompiler.jar file.

This compiles the source WDL file to platform objects in the specified DNAnexus project project-xxxx under folder /my/workflows/

  • A workflow bam_chrom_counter

  • Two applets that can be called independently: slice_bam, and count_bam

  • A few auxiliary applets that process workflow inputs, outputs, and launch the scatter.

You can review the structure of the compiled DNAnexus workflow using the describe dxCompiler command. The output shows the generated DNAnexus workflows and applets in a tree that describes a caller/callee relationship.

The generated workflow can be executed from the UI or via the DNAnexus command-line client. For example, to run the workflow with the input BAM file project-BQbJpBj0bvygyQxgQ1800Jkk:file-FpQKQk00FgkGV3Vb3jJ8xqGV, use the following command:

Alternatively, you can also convert a Cromwell JSON format input file into a DNAnexus format when compiling the workflow. Then you can pass the DNAnexus input file to dx run using -f option as described in detail in the dxCompiler expert options documentation.

After launching the workflow analysis, you can monitor workflow execution via the CLI or monitor workflow execution via the UI.

The CLI sessions show the executed workflow:

The snapshot below shows what you see from the UI when the workflow execution is completed:

CWL Example

Preprocessing a CWL Workflow

dxCompiler requires the source CWL file to be "packed" as a cwl.json file, which contains a single compound workflow with all the dependent processes included. You might need to upgrade the version of your workflow to CWL v1.2.

Use the bam_chrom_counter CWL workflow similar to the WDL example above to illustrate upgrading, packing and running a CWL workflow. This workflow is written in CWL v1.0 and the top-level Workflow in bam_chrom_counter.cwl calls the two CommandLineTool tools in slice_bam.cwl and count_bam.cwl.

Before compilation, follow the steps below to preprocess these CWL files:

  1. De-localize all local paths referenced in the CWL: if the CWL specifies a local path, for example, a schema or a default value for a file-type input (like the default path "path/to/my/input_bam" for input bam in bam_chrom_counter.cwl), you need to upload this file to a DNAnexus project and then replace the local path in the CWL with its full DNAnexus URI, for example, dx://project-XXX:file-YYY.

  2. Install cwl-upgrader and upgrade the CWL files to v1.2 (needed in this case as CWL files are in CWL v1.0):

  3. Install sbpack package and run the cwlpack command on the top-level workflow file to build a single packed bam_chrom_counter.cwl.json file containing the top level workflow and all the steps it depends on:

Validating a Workflow

dxCompiler compiles tools/workflows written according to the CWL v1.2 standard. You can use cwltool --validate to validate the packed CWL file you want to compile.

Compiling and Running a Workflow

Once it is upgraded and packed as suggested above, compile it as a DNAnexus workflow and run it.

Limitations

  • WDL and CWL

    • Calls with missing arguments have limited support

    • All task and workflow names must be unique across the entire import tree

      • For example, if A.wdl imports B.wdl and A.wdl defines workflow foo, then B.wdl cannot have a workflow or task named foo

    • Subworkflows built from higher-level workflows are not intended to be used on their own

  • WDL only

    • Workflows with forward references are not yet supported. A forward reference is a variable referenced before it is declared.

    • The alternative workflow output syntax that has been deprecated since WDL draft2 is not supported

    • The call ... after syntax introduced in WDL 1.1 is not yet supported

  • CWL only

    • Calling native DNAnexus apps/applets in CWL workflow using dxni is not supported.

    • SoftwareRequirement and InplaceUpdateRequirement are not yet supported

    • Publishing a dxCompiler-generated workflow as a global workflow is not supported

    • Applet and job reuse is not supported

Help String

You can learn about dxComplier capabilities by running java -jar dxCompiler.jar without arguments or with the help action.

Last updated

Was this helpful?