The following sections introduce a few use cases for dxCompiler. This tool can be downloaded from the dxCompiler GitHub repository, which also contains more advanced options including DNAnexus extensions and integration with private Docker repositories.
dxCompiler Setup
For information on how to set up dxCompiler, see the instructions for dxCompiler setup on GitHub.
WDL Example
Validating a Workflow
dxCompiler uses wdlTools, a parser that adheres strictly to the WDL specifications. Most of the problematic automatic type conversions that are allowed by other WDL runtime engines are not allowed by dxCompiler. Use the command line tools in wdlTools to validate your WDL files before trying to compile them with dxCompiler. Tools like check and lint are especially useful for this validation.
Compiling and Running a Workflow
The bam_chrom_counter workflow is written in WDL. Task slice_bam splits a BAM file into an array of sub-files. Task count_bam counts the number of alignments on a BAM file. The workflow takes an input BAM file, calls slice_bam to split it into chromosomes, and calls count_bam in parallel on each chromosome. The results form a BAM index file, and an array with the number of reads per chromosome.
From the command line, compile the workflow to the DNAnexus Platform using the dxCompiler.jar file.
This compiles the source WDL file to platform objects in the specified DNAnexus project project-xxxx under folder /my/workflows/
A workflow bam_chrom_counter
Two applets that can be called independently: slice_bam, and count_bam
A few auxiliary applets that process workflow inputs, outputs, and launch the scatter.
You can review the structure of the compiled DNAnexus workflow using the describe dxCompiler command. The output shows the generated DNAnexus workflows and applets in a tree that describes a caller/callee relationship.
The generated workflow can be executed from the UI or via the DNAnexus command-line client. For example, to run the workflow with the input BAM file project-BQbJpBj0bvygyQxgQ1800Jkk:file-FpQKQk00FgkGV3Vb3jJ8xqGV, use the following command:
The snapshot below shows what you see from the UI when the workflow execution is completed:
CWL Example
Preprocessing a CWL Workflow
dxCompiler requires the source CWL file to be "packed" as a cwl.json file, which contains a single compound workflow with all the dependent processes included. You might need to upgrade the version of your workflow to CWL v1.2.
Use the bam_chrom_counter CWL workflow similar to the WDL example above to illustrate upgrading, packing and running a CWL workflow. This workflow is written in CWL v1.0 and the top-level Workflow in bam_chrom_counter.cwl calls the two CommandLineTool tools in slice_bam.cwl and count_bam.cwl.
Before compilation, follow the steps below to preprocess these CWL files:
De-localize all local paths referenced in the CWL: if the CWL specifies a local path, for example, a schema or a default value for a file-type input (like the default path "path/to/my/input_bam" for input bam in bam_chrom_counter.cwl), you need to upload this file to a DNAnexus project and then replace the local path in the CWL with its full DNAnexus URI, for example, dx://project-XXX:file-YYY.
Install cwl-upgrader and upgrade the CWL files to v1.2 (needed in this case as CWL files are in CWL v1.0):
Install sbpack package and run the cwlpack command on the top-level workflow file to build a single packed bam_chrom_counter.cwl.json file containing the top level workflow and all the steps it depends on:
Validating a Workflow
dxCompiler compiles tools/workflows written according to the CWL v1.2 standard. You can use cwltool --validate to validate the packed CWL file you want to compile.
Compiling and Running a Workflow
Once it is upgraded and packed as suggested above, compile it as a DNAnexus workflow and run it.
The call ... after syntax introduced in WDL 1.1 is not yet supported
CWL only
Calling native DNAnexus apps/applets in CWL workflow using dxni is not supported.
SoftwareRequirement and InplaceUpdateRequirement are not yet supported
Publishing a dxCompiler-generated workflow as a global workflow is not supported
Applet and job reuse is not supported
Help String
You can learn about dxCompiler capabilities by running java -jar dxCompiler.jar without arguments or with the help action.
Dive deeper into dxCompiler
To explore and understand additional compiler options and features, refer to the Expert options in dxCompiler repository on GitHub. The documentation provides a comprehensive overview of the advanced features, including DNAnexus extensions, publishing global workflows, and more.
pip3 install cwl-upgrader
# upgrade all dependent CWL files, which will be saved in the current working directory
cd contrib/beginner_example
cwl-upgrader cwl_v1.0/bam_chrom_counter.cwl cwl_v1.0/slice_bam.cwl cwl_v1.0/count_bam.cwl
java -jar dxCompiler.jar <action> <parameters> [options]
Actions:
version
Prints the dxCompiler version.
config
Prints the current dxCompiler configuration.
describe <DxWorkflow ID>
Generate the JSON execution tree for a given DNAnexus workflow ID.
The workflow needs to be have been previously compiled by dxCompiler.
options
-pretty Print exec tree in "pretty" text format instead of JSON.
compile <WDL or CWL file>
Compile a WDL or CWL file to a DNAnexus workflow or applet.
options
-archive Archive older versions of applets.
-compileMode [IR, All]
Compilation mode - If not specified, the compilation
mode is "All" and the compiler will translate WDL or CWL
inputs into DNAnexus workflows and tasks.
Use "IR" if you only want to parse CWL or WDL files and
convert standard-formatted input files to DNAnexus JSON
input format without performing full compilation.
-defaults <string> JSON file with standard-formatted default values.
-defaultInstanceType <string>
The default instance type to use for "helper" applets
that perform runtime evaluation of instance type
requirements. This instance type is also used when
the '-instanceTypeSelection dynamic' option is set.
This value is overriden by any defaults set in set in the
JSON file specified by '-extras'.
-destination <string> Full platform path (project:/folder).
-execTree [json,pretty]
Print a JSON representation of the workflow.
-executableCreationParallelism <int>
The maximum number of platform executables that dxCompiler can
create in parallel, defaults to 1.
-extras <string> JSON file with extra options (see documentation).
-inputs <string> JSON file with standard-formatted input values. May be
specified multiple times. A DNAnexus JSON input file is
generated for each standard input file.
-instanceTypeSelection [static,dynamic]
Whether to select instance types at compile time for tasks with
runtime requirements that can all be statically evaluated
(the default "static" option), or to defer instance type
selection in such cases to runtime (the "dynamic" option).
Using static instance type selection can save time, but it
requires the same set of instances to be accessible during WDL/CWL
compilation and during the runtime of the generated applets and
workflows. Use the "dynamic" option if you plan on creating global
DNAnexus workflows or cloning the generated workflows between
DNAnexus organizations with different available instance types.
-locked Create a locked workflow. When running a locked workflow,
input values may only be specified for the top-level workflow.
-leaveWorkflowsOpen Leave created workflows open (otherwise they are closed).
-p | -imports <string> Directory to search for imported WDL or CWL files. May be specified
multiple times.
-projectWideReuse Look for existing applets/workflows in the entire project
before generating new ones. The default search scope is the
target folder only.
-reorg Reorganize workflow output files.
-runtimeDebugLevel [0,1,2]
How much debug information to write to the job log at runtime.
Log the minimum (0), intermediate (1, the default), or all
debug information (2, for internal debugging).
-separateOutputs Store the output files of each call in a separate folder. The
default behavior is to put all outputs in the same folder.
-streamFiles [all,none,perfile]
Whether to mount all files with dxfuse (do not use the
download agent), to mount no files with dxfuse (only use
download agent), or to respect the per-file settings in WDL
parameter_meta sections (default).
-useManifests Use manifest files for all workflow and applet inputs and
outputs. Implies -locked.
-waitOnUpload Whether to wait for each file upload to complete.
dxni (WDL only)
DNAnexus Native call Interface. Creates stubs for calling DNAnexus
apps/applets, and stores them as WDL tasks in a local file. Enables
calling existing platform executables without modification.
options:
-apps [include,exclude,only]
Option 'include' includes both apps and applets, 'exclude'
excludes apps and generates applet stubs only, 'only'
generates app stubs only.
-f | force Delete any existing output file.
-o <path> Destination file for WDL task definitions (defaults to
stdout).
-path <string> Name of a specific app or a path to a specific applet.
For recursive search in folders use -folder with -r flag.
-r | recursive Search recursively for applets in the target folder.
Common options
-folder <string> Platform folder (defaults to '/').
-project <string> Platform project (defaults to currently selected project).
-language <string> [ver] Which language to use? May be WDL or CWL. You can optionally
specify a version. Currently: i. WDL: draft-2, 1.0, and 1.1, and
ii. CWL: 1.2 are supported and WDL development is partially
supported. The default is to auto-detect the language from the
source file.
-quiet Do not print warnings or informational outputs.
-verbose Print detailed logging.
-verboseKey <module> Print verbose output only for a specific module. May be
specified multiple times.
-logFile <path> File to use for logging output; defaults to stderr.