dxCompiler
An introduction to using dxCompiler, a tool for compiling WDL and CWL workflows on the DNAnexus platform
dxCompiler is a tool for compiling pipelines written in the Workflow Description Language (WDL) and the Common Workflow Language (CWL) to equivalent workflows on the DNAnexus platform.
Below we introduce a few use cases for dxCompiler. This tool can be downloaded from the dxCompiler Github repository, which also contains more advanced options including DNAnexus extensions and integration with private docker repositories.

dxCompiler Setup

For information on how to set up dxCompiler, see instructions on the dxCompiler Github page.

WDL Example

Validate the workflow

dxCompiler uses wdlTools, a parser that adheres strictly to the WDL specifications. Most of the problematic automatic type conversions that are allowed by some other WDL runtime engines are not allowed by dxCompiler. Please use the command line tools in wdlTools (e.g. check and lint) to validate your WDL files before trying to compile them with dxCompiler.

Compile and run workflow

The bam_chrom_counter workflow is written in WDL. Task slice_bam splits a bam file into an array of sub-files. Task count_bam counts the number of alignments on a bam file. The workflow takes an input bam file, calls slice_bam to split it into chromosomes, and calls count_bam in parallel on each chromosome. The results comprise a bam index file, and an array with the number of reads per chromosome.
bam_chrom_counter.wdl
1
version 1.0
2
3
workflow bam_chrom_counter {
4
input {
5
File bam
6
}
7
8
call slice_bam {
9
input : bam = bam
10
}
11
scatter (slice in slice_bam.slices) {
12
call count_bam {
13
input: bam = slice
14
}
15
}
16
output {
17
File bai = slice_bam.bai
18
Array[Int] count = count_bam.count
19
}
20
}
21
22
task slice_bam {
23
input {
24
File bam
25
Int num_chrom = 22
26
}
27
command <<<
28
set -ex
29
samtools index ~{bam}
30
mkdir slices/
31
for i in `seq ~{num_chrom}`; do
32
samtools view -b ~{bam} -o slices/$i.bam $i
33
done
34
>>>
35
runtime {
36
docker: "quay.io/biocontainers/samtools:1.12--hd5e65b6_0"
37
}
38
output {
39
File bai = "~{bam}.bai"
40
Array[File] slices = glob("slices/*.bam")
41
}
42
}
43
44
task count_bam {
45
input {
46
File bam
47
}
48
49
command <<<
50
samtools view -c ~{bam}
51
>>>
52
runtime {
53
docker: "quay.io/biocontainers/samtools:1.12--hd5e65b6_0"
54
}
55
output {
56
Int count = read_int(stdout())
57
}
58
}bamwdl
Copied!
From the command line, we can compile the workflow to the DNAnexus platform using the dxCompiler jar file.
1
$ java -jar dxCompiler.jar compile bam_chrom_counter.wdl \
2
-project project-xxxx -folder /my/workflows/
Copied!
This compiles the source WDL file to several platform objects in the specified DNAnexus project project-xxxx under folder /my/workflows/
  • A workflow bam_chrom_counter
  • Two applets that can be called independently: slice_bam, and count_bam
  • A few auxiliary applets that process workflow inputs, outputs, and launch the scatter.
You can review the structure of the compiled DNAnexus workflow using the describe dxCompiler subcommand. The output shows the generated DNAnexus workflows and applets in a tree that describes a caller/callee relationship.
1
$ java -jar dxCompiler.jar describe workflow-G3P0jFQ0Fgk3Z7855fqKyjPy --pretty
2
Workflow: bam_chrom_counter
3
├───App Inputs: common
4
├───App Task: slice_bam
5
├───App Fragment: scatter (slice in slice_bam.slices)
6
│ └───App Task: count_bam
7
└───App Outputs: outputs
Copied!
The generated workflow can be executed from the web UI (see instructions here) or via the DNAnexus command-line client. For example, to run the workflow with the input bam file project-BQbJpBj0bvygyQxgQ1800Jkk:file-FpQKQk00FgkGV3Vb3jJ8xqGV, use the following command:
1
dx run bam_chrom_counter \
2
-istage-common.bam=project-BQbJpBj0bvygyQxgQ1800Jkk:file-FpQKQk00FgkGV3Vb3jJ8xqGV
Copied!
Alternatively, you can also convert a Cromwell JSON format input file into a DNAnexus format when compiling the workflow. Then you can pass the DNAnexus input file to dx run using -f option as described in detail here.
1
$ java -jar dxCompiler.jar compile bam_chrom_counter.wdl \
2
-project project-xxxx -folder /my/workflows/ \
3
-inputs bam_chrom_counter_input.json
4
$ dx run bam_chrom_counter -f bam_chrom_counter_input.dx.json
Copied!
After launching the workflow analysis, you can monitor it on the CLI following these instructions or from the UI as suggested here. CLI sessions shows the executed workflow:
1
$ dx find executions -n1
2
* bam_chrom_counter (done) analysis-G3P0kGj0Fgk8ZY0645QYZ9bB
3
│ demo-user 2022-03-01 11:42:59
4
├── * outputs (bam_chrom_counter_outputs:main) (done) job-G3P0kkQ0Fgk8ZY0645QYZ9bX
5
│ demo-user 2022-03-01 11:43:44 (runtime 0:00:42)
6
├── * scatter (slice in slice_bam.slices) (bam_chrom_counter_frag_stage-1:main) (done) job-G3P0kj00Fgk8ZY0645QYZ9bV
7
│ │ demo-user 2022-03-01 11:43:38 (runtime 0:00:46)
8
│ ├── collect_scatter (bam_chrom_counter_frag_stage-1:collect) (done) job-G3P1Gb80Fgk8gx406ppxVZX6
9
│ │ demo-user 2022-03-01 12:24:05 (runtime 0:00:42)
10
│ ├── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1Gb80Fgk3ZjYx714yFZyQ
11
│ │ demo-user 2022-03-01 12:24:05 (runtime 0:01:31)
12
│ ├── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1Gb00FgkBb45v7197QVb2
13
│ │ demo-user 2022-03-01 12:24:04 (runtime 0:01:34)
14
...
15
│ ├── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1GZQ0Fgk0QBKp6pVZgzFg
16
│ │ demo-user 2022-03-01 12:24:01 (runtime 0:01:33)
17
│ ├── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1GZ80Fgk9vFj26Vg599P4
18
│ │ demo-user 2022-03-01 12:24:01 (runtime 0:01:33)
19
│ └── count_bam dx://project-FZjPqzQ0Fgk1PPxyBJq4ZP0J:file-G3P1... (count_bam:main) (done) job-G3P1GZ80Fgk6VJ5v6kGYp5k9
20
│ demo-user 2022-03-01 12:24:01 (runtime 0:01:53)
21
├── * slice_bam (done) job-G3P0kfQ0Fgk8ZY0645QYZ9bQ
22
│ demo-user 2022-03-01 11:43:31 (runtime 0:37:14)
23
└── * common (bam_chrom_counter_common:main) (done) job-G3P0kZj0Fgk8ZY0645QYZ9bP
24
demo-user 2022-03-01 11:43:25 (runtime 0:00:45)
Copied!
The snapshot below shows what you will see from the UI when the workflow execution is completed:

CWL Example

Preprocess CWL workflow

dxCompiler requires the source CWL file to be "packed" as a cwl.json file, which contains a single compound workflow with all the dependent processes included. Additionally, you may need to upgrade the version of your workflow to CWL v1.2.
We'll use the bam_chrom_counter CWL workflow similar to the WDL example above to illustrate upgrading, packing and running a CWL workflow. This workflow is written in CWL v1.0 and the top-level Workflow in bam_chrom_counter.cwl calls the two CommandLineTools in slice_bam.cwl and count_bam.cwl.
bam_chrom_counter.cwl
1
cwlVersion: v1.0
2
id: bam_chrom_counter
3
class: Workflow
4
requirements:
5
- class: ScatterFeatureRequirement
6
inputs:
7
- id: bam
8
type: File
9
# upload this local file to the platform and replace the path below with the DNAnexus URI "dx://project-xxx:file-yyyy"
10
default: "path/to/my/input_bam"
11
outputs:
12
- id: bai
13
type: File
14
outputSource: slice_bam/bai
15
- id: count
16
type: int[]
17
outputSource: count_bam/count
18
steps:
19
- id: slice_bam
20
run: slice_bam.cwl
21
in:
22
bam: bam
23
out: [bai, slices]
24
- id: count_bam
25
run: count_bam.cwl
26
scatter: bam
27
in:
28
bam: slice_bam/slices
29
out: [count]
Copied!
slice_bam.cwl
1
cwlVersion: v1.0
2
id: slice_bam
3
class: CommandLineTool
4
inputs:
5
- id: bam
6
type: File
7
- id: num_chrom
8
default: 22
9
type: int
10
outputs:
11
- id: bai
12
type: File
13
outputBinding:
14
glob: $(inputs.bam.basename).bai
15
- id: slices
16
type: File[]
17
outputBinding:
18
glob: "slices/*.bam"
19
requirements:
20
- class: InlineJavascriptRequirement
21
- class: ShellCommandRequirement
22
- class: DockerRequirement
23
dockerPull: "quay.io/biocontainers/samtools:1.12--hd5e65b6_0"
24
- class: InitialWorkDirRequirement
25
listing:
26
- entryname: slice_bam.sh
27
entry: |-
28
set -ex
29
samtools index $1
30
mkdir slices/
31
for i in `seq $2`; do
32
samtools view -b $1 -o slices/$i.bam $i
33
done
34
- entry: $(inputs.bam)
35
baseCommand: ["sh", "slice_bam.sh"]
36
arguments:
37
- position: 0
38
valueFrom: $(inputs.bam.basename)
39
- position: 1
40
valueFrom: $(inputs.num_chrom)
41
hints:
42
- class: NetworkAccess
43
networkAccess: true
44
- class: LoadListingRequirement
45
loadListing: deep_listing
Copied!
count_bam.cwl
1
cwlVersion: v1.0
2
id: count_bam
3
class: CommandLineTool
4
requirements:
5
- class: InlineJavascriptRequirement
6
- class: ShellCommandRequirement
7
- class: DockerRequirement
8
dockerPull: "quay.io/biocontainers/samtools:1.12--hd5e65b6_0"
9
inputs:
10
- id: bam
11
type: File
12
inputBinding:
13
position: 1
14
baseCommand: ["samtools", "view", "-c"]
15
outputs:
16
- id: count
17
type: int
18
outputBinding:
19
glob: stdout
20
loadContents: true
21
outputEval: "$(parseInt(self[0].contents))"
22
stdout: stdout
23
hints:
24
- class: NetworkAccess
25
networkAccess: true
26
- class: LoadListingRequirement
27
loadListing: deep_listingcount
Copied!
Before compilation, follow the steps below to preprocess these CWL files:
  1. 1.
    De-localize all local paths referenced in the CWL: if the CWL specifies a local path, e.g. a schema or a default value for a file-type input (like the default path "path/to/my/input_bam" for input bam in bam_chrom_counter.cwl), you need to upload this file to a DNAnexus project and then replace the local path in the CWL with its full DNAnexus URI, e.g. dx://project-XXX:file-YYY.
  2. 2.
    Install cwl-upgrader and upgrade the CWL files to v1.2 (needed in this case as CWL files are in CWL v1.0):
    1
    $ pip3 install cwl-upgrader
    2
    3
    # upgrade all dependent CWL files, which will be saved in the current working directory
    4
    $ cd contrib/beginner_example
    5
    $ cwl-upgrader cwl_v1.0/bam_chrom_counter.cwl cwl_v1.0/slice_bam.cwl cwl_v1.0/count_bam.cwl
    Copied!
  3. 3.
    Install sbpack package and run the cwlpack command on the top-level workflow file to build a single packed bam_chrom_counter.cwl.json file containing the top level workflow and all the steps it depends on:
    1
    $ pip3 install sbpack
    2
    $ cwlpack --add-ids --json bam_chrom_counter.cwl > bam_chrom_counter.cwl.json
    Copied!

Validate the workflow

dxCompiler compiles tools/workflows written according to the CWL v1.2 standard. You can use cwltool --validate to validate the packed CWL file you want to compile.

Compile and run workflow

Once it is upgraded and packed as suggested above, we can compile it as a DNAnexus workflow and run it.
1
$ java -jar dxCompiler.jar compile bam_chrom_counter.cwl.json -project project-xxxx -folder /my/workflows/
2
$ dx run bam_chrom_counter -istage-common.bam=project-BQbJpBj0bvygyQxgQ1800Jkk:file-FpQKQk00FgkGV3Vb3jJ8xqGV
Copied!

Limitations

  • WDL and CWL
    • Calls with missing arguments have limited support
    • All task and workflow names must be unique across the entire import tree
      • For example, if A.wdl imports B.wdl and A.wdl defines workflow foo, then B.wdl cannot have a workflow or task named foo
    • Subworkflows built from higher-level workflows are not intented to be used on their own
  • WDL only
    • Workflows with forward references (i.e. a variable referenced before it is declared) are not yet supported
    • The alternative workflow output syntax that has been deprecated since WDL draft2 is not supported
    • The call ... after syntax introduced in WDL 1.1 is not yet supported
  • CWL only
    • Calling native DNAnexus apps/applets in CWL workflow using dxni is not supported.
    • SoftwareRequirement and InplaceUpdateRequirement are not yet supported
    • Publishing a dxCompiler-generated workflow as a global workflow is not supported
    • Applet and job reuse is not supported

Help String

dxCompiler's capabilities are outlined in the dxCompiler's help string below.
1
$ java -jar dxCompiler.jar version
2
2.10.1
3
$ java -jar dxCompiler.jar help
4
5
java -jar dxCompiler.jar <action> <parameters> [options]
6
7
Actions:
8
version
9
Prints the dxCompiler version.
10
11
config
12
Prints the current dxCompiler configuration.
13
14
describe <DxWorkflow ID>
15
Generate the JSON execution tree for a given DNAnexus workflow ID.
16
The workflow needs to be have been previously compiled by dxCompiler.
17
options
18
-pretty Print exec tree in "pretty" text format instead of JSON.
19
20
compile <WDL or CWL file>
21
Compile a WDL or CWL file to a DNAnexus workflow or applet.
22
options
23
-archive Archive older versions of applets.
24
-compileMode [IR, All]
25
Compilation mode - If not specified, the compilation
26
mode is "All" and the compiler will translate WDL or CWL
27
inputs into DNAnexus workflows and tasks.
28
Use "IR" if you only want to parse CWL or WDL files and
29
convert standard-formatted input files to DNAnexus JSON
30
input format without performing full compilation.
31
-defaults <string> JSON file with standard-formatted default values.
32
-defaultInstanceType <string>
33
The default instance type to use for "helper" applets
34
that perform runtime evaluation of instance type
35
requirements. This instance type is also used when
36
the '-instanceTypeSelection dynamic' option is set.
37
This value is overriden by any defaults set in set in the
38
JSON file specified by '-extras'.
39
-destination <string> Full platform path (project:/folder).
40
-execTree [json,pretty]
41
Print a JSON representation of the workflow.
42
-extras <string> JSON file with extra options (see documentation).
43
-inputs <string> JSON file with standard-formatted input values. May be
44
specified multiple times. A DNAnexus JSON input file is
45
generated for each standard input file.
46
-instanceTypeSelection [static,dynamic]
47
Whether to select instance types at compile time for tasks with
48
runtime requirements that can all be statically evaluated
49
(the default "static" option), or to defer instance type
50
selection in such cases to runtime (the "dynamic" option).
51
Using static instance type selection can save time, but it
52
requires the same set of instances to be accessible during WDL/CWL
53
compilation and during the runtime of the generated applets and
54
workflows. Use the "dynamic" option if you plan on creating global
55
DNAnexus workflows or cloning the generated workflows between
56
DNAnexus organizations with different available instance types.
57
-locked Create a locked workflow. When running a locked workflow,
58
input values may only be specified for the top-level workflow.
59
-leaveWorkflowsOpen Leave created workflows open (otherwise they are closed).
60
-p | -imports <string> Directory to search for imported WDL or CWL files. May be specified
61
multiple times.
62
-projectWideReuse Look for existing applets/workflows in the entire project
63
before generating new ones. The default search scope is the
64
target folder only.
65
-reorg Reorganize workflow output files.
66
-runtimeDebugLevel [0,1,2]
67
How much debug information to write to the job log at runtime.
68
Log the minimum (0), intermediate (1, the default), or all
69
debug information (2, for internal debugging).
70
-separateOutputs Store the output files of each call in a separate folder. The
71
default behavior is to put all outputs in the same folder.
72
-streamFiles [all,none,perfile]
73
Whether to mount all files with dxfuse (do not use the
74
download agent), to mount no files with dxfuse (only use
75
download agent), or to respect the per-file settings in WDL
76
parameter_meta sections (default).
77
-useManifests Use manifest files for all workflow and applet inputs and
78
outputs. Implies -locked.
79
-waitOnUpload Whether to wait for each file upload to complete.
80
81
dxni (WDL only)
82
DNAnexus Native call Interface. Creates stubs for calling DNAnexus executables
83
(apps/applets/workflows), and stores them as WDL tasks in a local file. Enables
84
calling existing platform executables without modification.
85
options:
86
-apps [include,exclude,only]
87
Option 'include' includes both apps and applets, 'exclude'
88
excludes apps and generates applet stubs only, 'only'
89
generates app stubs only.
90
-f | force Delete any existing output file.
91
-o <path> Destination file for WDL task definitions (defaults to
92
stdout).
93
-path <string> Name of a specific app or a path to a specific applet.
94
-r | recursive Search recursively for applets in the target folder.
95
96
Common options
97
-folder <string> Platform folder (defaults to '/').
98
-project <string> Platform project (defaults to currently selected project).
99
-language <string> [ver] Which language to use? May be WDL or CWL. You can optionally
100
specify a version. Currently: i. WDL: draft-2, 1.0, and 1.1, and
101
ii. CWL: 1.2 are supported and WDL development is partially
102
supported. The default is to auto-detect the language from the
103
source file.
104
-quiet Do not print warnings or informational outputs.
105
-verbose Print detailed logging.
106
-verboseKey <module> Print verbose output only for a specific module. May be
107
specified multiple times.
108
-logFile <path> File to use for logging output; defaults to stderr.
109

Copied!
A detailed description of the advanced dxCompiler features can be found in the public dxCompiler github repository here.