NOTE: Before we begin, you should first download the DNAnexus SDK and run through the Command-Line Quickstart if you haven't already. If this is your first time writing a DNAnexus app, we recommend you first go through the Intro to Building Apps Tutorial before diving into this one.
dxapp.json
. This file should be located in the root directory of the app directory as shown in the structure above.dxapp.json
is a DNAnexus application metadata file. Its presence in a directory tells DNAnexus tools that it contains DNAnexus applet source code. We explain selected fields of this file below.sambamba_merge_applet
(field: name
). Under the inputSpec
field, we specify that the app will take in 2 inputs:sorted_bams
: an array of BAM filesadvanced_options
: an optional string of advanced command line options to be passed to the Sambamba merge command.outputSpec
field, we specify that the app will always return 1 output:merged_bam
: a single merged BAM filesorted_bams
input and merged_bam
output should contain filenames that match the pattern "\*.bam"
. This specification tells the web UI to filter only files which match this pattern when selecting input files.runSpec
that this is a bash script (field: interpreter
) and that the worker running the applet should execute the executable located in the applet directory at src/script.sh
(field: file
).runSpec
, systemRequirements
, *
, instanceType
, we specify that all entry points of the applet should be run with the mem2_ssd1_v2_x4 instance type.resources/usr/bin/
directory of your app directory.script.sh
and located in the applet directory at the path src/script.sh
. This location is important as this is the location specified in the dxapp.json
above.-e
flag causes bash to exit at any point if there is any error, the -o pipefail
flag tells bash to throw an error if it encounters an error within a pipeline, while the -x
flag causes bash to output each line as it is executed -- useful for debugging.-e
set by default. If you wish to keep the script running to the end regardless of any errors that may occur during the execution, use set +e
at the beginning of the script.dx-download-all-inputs
command-line utility. Add this line to your script.sh
:$HOME/in/
. Each file input parameter specified under inputSpec
in the dxapp.json
will have its own folder under the $HOME/in/
directory. In the case of this applet, there will be one folder for the sorted_bams
input on the path $HOME/in/sorted_bams/
. Since sorted_bams
is an array of files, these files will be placed into numbered subdirectories under a parent directory $HOME/in/sorted_bams/
. For example, if the user supplied the following 3 files to the applet, SRR100022_chrom20_mapped_to_b37.bam
, SRR100022_chrom21_mapped_to_b37.bam
SRR100022_chrom22_mapped_to_b37.bam
, in that order, the files would be downloaded into the following paths respectively:$HOME/in/sorted_bams/0/SRR100022_chrom20_mapped_to_b37.bam
$HOME/in/sorted_bams/1/SRR100022_chrom21_mapped_to_b37.bam
$HOME/in/sorted_bams/2/SRR100022_chrom22_mapped_to_b37.bam
$HOME/out/merged_bam
, which corresponds to the merged_bam output parameter in the dxapp.json
. Later, we will place the output of Sambamba merge, a merged BAM file, into this subdirectory.dx-upload-all-outputs
. This utility will automatically upload all files found on the path $HOME/out/
and link the files to the appropriate output parameter (the outputs specified under outputSpec
in the dxapp.json
).$HOME/out/merged_bam/
will be uploaded as the merged_bam output parameter of the job.NOTE: The execution of an applet on a worker starts inside $HOME, so in this tutorial $HOME/in, ~/out, and out/ are all the same since we have not changed directories.
$sorted_bams_prefix
variable to help us name our output file. This variable is provided for every file
or array:file
input parameter specified in the applet's dxapp.json
.array:file
. The variable $sorted_bams_prefix
is a bash array of filenames of every file in the file array with the extension stripped off, as well as any .gz
extension (if applicable).$sorted_bams_prefix[0]
will be NA12878.chr1
, the second item $sorted_bams_prefix[1]
will be NA12878.chr2
, etc.script.sh
:$advanced_options
during app initialization.$output name
bash variable set in the section above.$sorted_bams_path
variable to help us pass the input files to the executable.dxapp.json
. This bash variable stores the full path of each input file, assuming that the file was downloaded using dx-download-all-inputs
.$sorted_bams_path
variable is a bash array containing the file paths of the files given as input to sorted_bams, in the order they were given to the app. "$sorted_bams_path[@]"
represents the array as a string, properly tokenized for any whitespace.NA12878.chr1.bam
, NA12878.chr2.bam
, and NA12878.chr3.bam
, the interpreted sambamba merge command will look like this:out/merged_bam/
folder to be uploaded using dx-upload-all-outputs
. This utility will upload the contents of the subdirectories on the path $HOME/out/
.dxapp.json
, and shell script (src/script.sh
). However, you can automate this step by using the dx-app-wizard
as explained in the Intro to Building Apps tutorial.dx-app-wizard
will prompt you for inputs, and automatically creates the dxapp.json
based on your answers and a template file for your shell script. However, the app wizard was not intended to be a tool for the advanced developer. Thus, it does not prompt you for more advanced fields in the applet specification such as patterns
, and instanceType
. Additionally, it does not use the dx-download-all-inputs
or dx-upload-all-outputs
utilities.dxapp.json
. Afterwards, you can then go back in and add additional fields to the dxapp.json
and replace the template bash script with your own.dx get
command to reconstruct and download the source directory of open-source apps (e.g. dx get app-cloud_workstation
). You can find open-source apps with the command below