dxCompiler

An introduction to using dxCompiler, a tool for compiling WDL workflows to DNAnexus platform

dxCompiler is a tool for compiling pipelines written in the Workflow Description Language (WDL) to equivalent workflows on the DNAnexus platform.

Below we introduce a few use cases for dxCompiler. This tool can be downloaded from the dxCompiler Github repository, which also contains more advanced options including DNAnexus extensions and integration with private docker repositories.

WDL Example

The bam_chrom_counter WDL workflow below takes a BAM file as input, calls the slice_bam task to split the BAM file into chromosomes, then calls count_bam task on each chromosome. The results are a BAM index file and an array with the number of reads per chromosome.

$ cat bam_chrom_counter.wdl
version 1.0
workflow bam_chrom_counter {
input {
File bam
}
call slice_bam {
input : bam = bam
}
scatter (slice in slice_bam.slices) {
call count_bam {
input: bam = slice
}
}
output {
File bai = slice_bam.bai
Array[Int] count = count_bam.count
}
}
task slice_bam {
input {
File bam
Int num_chrom = 22
}
command <<<
set -ex
samtools index ~{bam}
mkdir slices/
for i in `seq ~{num_chrom}`; do
samtools view -b ~{bam} -o slices/$i.bam $i
done
>>>
runtime {
docker: "quay.io/ucsc_cgl/samtools"
}
output {
File bai = "~{bam}.bai"
Array[File] slices = glob("slices/*.bam")
}
}
task count_bam {
input {
File bam
}
command <<<
samtools view -c ~{bam}
>>>
runtime {
docker: "quay.io/ucsc_cgl/samtools"
}
output {
Int count = read_int(stdout())
}
}

After installing dxCompiler following the instructions in the Downloads section, the following command will compile the WDL workflow into a DNAnexus platform workflow.

$ java -jar dxCompiler.jar compile bam_chrom_counter.wdl
workflow-G3P0jFQ0Fgk3Z7855fqKyjPy

You can review the structure of the compiled DNAnexus workflow using the describe dxCompiler subcommand. The output shows the generated DNAnexus workflows and applets in a tree that describes a caller/callee relationship.

$ java -jar dxCompiler.jar describe workflow-G3P0jFQ0Fgk3Z7855fqKyjPy --pretty
Workflow: bam_chrom_counter
├───App Inputs: common
├───App Task: slice_bam
├───App Fragment: scatter (slice in slice_bam.slices)
│ └───App Task: count_bam
└───App Outputs: outputs

The commands below run the DNAnexus workflow with input BAM file set to fileSRR504516.bam hosted in the Results folder of the public Exome Analysis DemoDnanexus project, and display the execution tree.

$ dx run workflow-G3P0jFQ0Fgk3Z7855fqKyjPy \
-istage-common.bam="Exome Analysis Demo":/Results/SRR504516.bam
analysis-G3P0kGj0Fgk8ZY0645QYZ9bB
$ dx find executions -n1
* bam_chrom_counter (done) analysis-G3P0kGj0Fgk8ZY0645QYZ9bB
│ demo-user 2021-07-01 11:42:59
├── * outputs (bam_chrom_counter_outputs:main) (done) job-G3P0kkQ0Fgk8ZY0645QYZ9bX
│ demo-user 2021-07-01 11:43:44 (runtime 0:00:42)
├── * scatter (slice in slice_bam.slices) (bam_chrom_counter_frag_stage-1:main) (done) job-G3P0kj00Fgk8ZY0645QYZ9bV
│ │ demo-user 2021-07-01 11:43:38 (runtime 0:00:46)
│ ├── collect_scatter (bam_chrom_counter_frag_stage-1:collect) (done) job-G3P1Gb80Fgk8gx406ppxVZX6
│ │ demo-user 2021-07-01 12:24:05 (runtime 0:00:42)
│ ├── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1Gb80Fgk3ZjYx714yFZyQ
│ │ demo-user 2021-07-01 12:24:05 (runtime 0:01:31)
│ ├── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1Gb00FgkBb45v7197QVb2
│ │ demo-user 2021-07-01 12:24:04 (runtime 0:01:34)
...
│ ├── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1GZQ0Fgk0QBKp6pVZgzFg
│ │ demo-user 2021-07-01 12:24:01 (runtime 0:01:33)
│ ├── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1GZ80Fgk9vFj26Vg599P4
│ │ demo-user 2021-07-01 12:24:01 (runtime 0:01:33)
│ └── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1GZ80Fgk6VJ5v6kGYp5k9
│ demo-user 2021-07-01 12:24:01 (runtime 0:01:53)
├── * slice_bam (done) job-G3P0kfQ0Fgk8ZY0645QYZ9bQ
│ demo-user 2021-07-01 11:43:31 (runtime 0:37:14)
└── * common (bam_chrom_counter_common:main) (done) job-G3P0kZj0Fgk8ZY0645QYZ9bP
demo-user 2021-07-01 11:43:25 (runtime 0:00:45)

Help String

dxCompiler's capabilities are outlined in the dxCompiler's help string below.

$ java -jar dxCompiler.jar version
2.4.7
$ java -jar dxCompiler.jar help
java -jar dxCompiler.jar <action> <parameters> [options]
Actions:
version
Prints the dxCompiler version.
config
Prints the current dxCompiler configuration.
describe <DxWorkflow ID>
Generate the JSON execution tree for a given DNAnexus workflow ID.
The workflow needs to be have been previously compiled by dxCompiler.
options
-pretty Print exec tree in "pretty" text format instead of JSON.
compile <WDL file>
Compile a WDL file to a DNAnexus workflow or applet.
options
-archive Archive older versions of applets.
-compileMode <string> Compilation mode - a debugging flag for internal use.
-defaults <string> JSON file with standard-formatted default values.
-destination <string> Full platform path (project:/folder).
-execTree [json,pretty]
Print a JSON representation of the workflow.
-extras <string> JSON file with extra options (see documentation).
-inputs <string> JSON file with standard-formatted input values. May be
specified multiple times. A DNAnexus JSON input file is
generated for each standard input file.
-locked Create a locked workflow. When running a locked workflow,
input values may only be specified for the top-level workflow.
-leaveWorkflowsOpen Leave created workflows open (otherwise they are closed).
-p | -imports <string> Directory to search for imported WDL files. May be specified
multiple times.
-projectWideReuse Look for existing applets/workflows in the entire project
before generating new ones. The default search scope is the
target folder only.
-reorg Reorganize workflow output files.
-runtimeDebugLevel [0,1,2]
How much debug information to write to the job log at runtime.
Log the minimum (0), intermediate (1, the default), or all
debug information (2, for internal debugging).
-separateOutputs Store the output files of each call in a separate folder. The
default behavior is to put all outputs in the same folder.
-streamFiles [all,none,perfile]
Whether to mount all files with dxfuse (do not use the
download agent), to mount no files with dxfuse (only use
download agent), or to respect the per-file settings in WDL
parameter_meta sections (default).
-useManifests Use manifest files for all workflow and applet inputs and
outputs. Implies -locked.
-waitOnUpload Whether to wait for each file upload to complete.
dxni
DNAnexus Native call Interface. Creates stubs for calling DNAnexus executables
(apps/applets/workflows), and stores them as WDL tasks in a local file. Enables
calling existing platform executables without modification.
options:
-apps [include,exclude,only]
Whether to 'include' apps, 'exclude' apps (the default), or
'only' generate app stubs.
-f | force Delete any existing output file.
-o <path> Destination file for WDL task definitions (defaults to
stdout).
-path <string> Name of a specific app or a path to a specific applet.
-r | recursive Search recursively for applets in the target folder.
Common options
-folder <string> Platform folder (defaults to '/').
-project <string> Platform project (defaults to currently selected project).
-language <string> [ver] Which language to use? May be WDL or CWL. You can optionally
specify a version. Currently, WDL draft-2, 1.0, and 1.1 are
fully supported and WDL development and CWL 1.2 are partially
supported. The default is to auto-detect the language from the
source file.
-quiet Do not print warnings or informational outputs.
-verbose Print detailed logging.
-verboseKey <module> Print verbose output only for a specific module. May be
specified multiple times.
-logFile <path> File to use for logging output; defaults to stderr.