Advanced Applet Tutorial
Learn to use Sambamba to create advanced Bash applets for use on the DNAnexus Platform.
Last updated
Was this helpful?
Learn to use Sambamba to create advanced Bash applets for use on the DNAnexus Platform.
Last updated
Was this helpful?
In this tutorial, you'll learn to create an advanced Bash applet. Your applet will use Sambamba, an open-source toolkit, to merge multiple BAM files into a single file.
If you're not familiar with Sambamba, review the .
Download the if you haven't already done so.
If you're not familiar with the dx
command-line client, review the .
If this is your first time writing an app for use on the DNAnexus Platform, also review the .
When you create an applet for use on the Platform, you start by creating a local directory structure to hold your source code and other resources. For this tutorial, you'll need to create the following directory structure:
To do this, open a terminal app on your local machine, navigate to a directory where you want to build your applet, then enter the following commands:
Next use a text editor to create a file called dxapp.json
, and, as shown in the previous section, save it in the applet's root directory, sambamba_merge_applet
.
The dxapp.json
file is a DNAnexus application metadata file. Its presence in a directory tells DNAnexus tools that it contains DNAnexus applet source code. Its component fields contain information about the applet, and specifications for how it will be run.
The file's structure and content should be as follows:
The applet's name - sambamba_merge_applet
- is specified in the name
field.
The inputSpec
field is an array containing two objects. Each provides specs for one of the two inputs taken by the applet:
sorted_bams
- This input consists of an array of BAM files. As indicated in the patterns
subfield, each of these files must have a name that ends with the extension .bam
.
The outputSpec
field is also an array, containing a single object that provides specs for the applet's single output:
merged_bam
- A single BAM file, which, as indicated in the patterns
subfield, must have a name that ends with the extension .bam
.
The value "bash" in the runSpec
field's interpreter
subfield specifies that the applet is a Bash script.
The value "src/script.sh" in the runSpec
field's file
subfield specifies that the worker running the applet should run the executable script.sh
, located in the applet's src
subdirectory.
You can use the access
key in your applet's dxapp.json
file to configure its ability to access the internet. See the following documentation for more information:
Now run the following commands:
The next step is to write the script that will be executed when the applet is launched.
Start by using a text editor to create a file named script.sh
and save it in the applet's src
subdirectory.
Add these two lines at the top of the file:
The second line contains settings to be used in executing the script:
The -e
flag ensures that execution will abort on an error.
The -o pipefail
ensures that Bash will throw an error if it encounters an error within a pipeline.
The -x
flag ensures that Bash will output each line as it is executed. This is useful for debugging.
On the DNAnexus Platform, the execution of an applet on a worker runs in the directory defined by the constant $HOME
. To be accessible to the execution, inputs need to be in the subdirectory $HOME/in
. Add the following line to your script file, to have your input files automatically downloaded to $HOME/in
:
Note that you named your applet's first input sorted_bams
, and defined it as an array of files. As such, when your input files are downloaded to $HOME/in
, they will be placed in a subdirectory $HOME/in/sorted_bams
. Each file will be placed in a separate subdirectory within $HOME/in/sorted_bams
, with these subdirectories named with integers starting with 0, like the elements of an array.
For example, if you supply three files to the applet, named SRR100022_chrom20_mapped_to_b37.bam
, SRR100022_chrom21_mapped_to_b37.bam
, and SRR100022_chrom22_mapped_to_b37.bam
, they will be downloaded to:
$HOME/in/sorted_bams/0/SRR100022_chrom20_mapped_to_b37.bam
$HOME/in/sorted_bams/1/SRR100022_chrom21_mapped_to_b37.bam
$HOME/in/sorted_bams/2/SRR100022_chrom22_mapped_to_b37.bam
The file structure for the inputs will then be as follows:
Next add the following line to your script, creating a directory $HOME/out/merged_bam
, to store your output file:
Note that the directory's name, merged_bam
, corresponds to the name of your applet's output parameter, as specific in the dxapp.json
file. This ensures that your applet will treat files in this directory as outputs, and then automatically upload them after execution has finished.
This line specifies that your output file will have a name with the following components:
${sorted_bams_prefix[0]}
- This adds to the output filename the prefix of the first file used as an input, i.e. that file's name, with its extension or extensions removed. sorted_bams
refers to the name of the array that contains your input files. _prefix
and [0]
specify the prefix of the file stored in the first element of that array. Wrapping this all in ${}
ensures that it is parsed as code rather than as a string.
_merged
- This string adds an additional descriptive element to the filename, making clear it consists of multiple files merged into one.
.bam
- This extension defines the file as a .bam
, or Binary Alignment Map, file.
Next add a line to your script that will launch Sambamba and have it merge your input files into a single file:
To break this down:
sambamba merge
invokes Sambamba's merge function.
"$output_name"
gives the output file the name that, in the previous step, you stored in the variable output_name
.
When your script is run, Bash will automatically interpret the variables you've included in the merge
command. So if, for example, you have three input files named NA12878.chr1.bam
, NA12878.chr2.bam
, and NA12878.chr3.bam
Bash will interpret your code as follows:
You can use the following merge command in your shell script, if for some reason you don't want it to leverage DNAnexus Platform environment variables:
After Sambamba merges your input files, the output file needs to be moved to the $HOME/out/merged_bam
folder on the worker. To provide for this, add the following line to your script:
You've completed your script. It should read as follows:
You're ready to build and run your applet.
If you haven't yet done so, log into the DNAnexus Platform using your terminal app. Then select the project in which you'd like to work.
If you choose to upload your own data, test your applet by doing an initial run that uses small files.
Now use dx build to build your applet. Enter the following command
In your terminal app, enter the following commands to build and run your applet:
Note that dx-app-wizard
does have certain limitations. It does not, for example, prompt you to provide advanced configuration settings, such as instanceType
specs, and the patterns
settings you added here to your applet's inputSpec
and outputSpec
definitions. In addition, it does not use either dx-download-all-inputs
or dx-upload-all-outputs
.
dx-app-wizard
can still be useful as a tool to speed your work, even if you want to leverage the advanced functionality available when you manually perform all the steps involved in creating an applet. You can, for example, use dx-app-wizard
to create the applet's directory structure and a basic dxapp.json
file. Then use a text editor to, as needed, add additional fields to the dxapp.json
file, and replace the Bash script created by dx-app-wizard
, with your own script.
To get a list of these apps, use the following command:
To download the source code of an open-source app available on the Platform, use dx get
as follows, swapping in the app's name for app-cloud_workstation
:
advanced_options
- This optional input consists of a string of advanced command-line options to be passed to Sambamba, for use in merging the source BAM files. See when your applet launches Sambamba.
In the runSpec
field'ssystemRequirements
subfield, note the value "mem2_ssd1_v2_x4" in the instanceType
field. This specifies that each of the applet's should be run using the mem2_ssd1_v2_x4 .
Download the Sambamba binary from the . Uncompress the executable and and place it in your applet's resources/usr/bin/
subdirectory.
The first line, sometimes called the "," specifies which interpreter should be used to parse the remainder of the file - in this case, the Bash interpreter.
On the DNAnexus Platform, can be used to set execution output filenames to follow a specific pattern. To leverage this feature, add the following line to your script:
$advanced_options
are any advanced Sambamba command-line options you specified, when
"${sorted_bams_path[@]}"
specifies that Sambamba should use as inputs the files stored in the array sorted_bams
. Adding _path
enables your script to leverage a to specify that Sambamba should look for this array in the directory $HOME/in/sorted_bams/
To ensure that ${sorted_bams_path[@]}
enables your script to find your input files and provide them to Sambamba, you must use dx-download-all-inputs
to download those files to $HOME/in
, as per .
Then your output file needs to be uploaded from the worker to the DNAnexus Platform. For this, use the utility , which will automatically upload the contents of all subdirectories on the path $HOME/out/
. To provide for this, add the following line to your script:
If you have BAM files you'd like to merge using your applet, . If you prefer, you can use the BAM files available in the .
In this tutorial you manually created the applet local directory, dxapp.json
, and shell script (src/script.sh
). These steps can be automated, by using dx-app-wizard
. Consult the for guidance on using dx-app-wizard
.
See the page for language-specific tutorials. Each provides guidance in crafting complex applets and apps, in a particular language.
You can download the source code of a number of open-source apps available for use on the DNAnexus Platform, such as .