DNAnexus Documentation
APIDownloadsIndex of dx CommandsLegal
  • Overview
  • Getting Started
    • DNAnexus Essentials
    • Key Concepts
      • Projects
      • Organizations
      • Apps and Workflows
    • User Interface Quickstart
    • Command Line Quickstart
    • Developer Quickstart
    • Developer Tutorials
      • Bash
        • Bash Helpers
        • Distributed by Chr (sh)
        • Distributed by Region (sh)
        • SAMtools count
        • TensorBoard Example Web App
        • Git Dependency
        • Mkfifo and dx cat
        • Parallel by Region (sh)
        • Parallel xargs by Chr
        • Precompiled Binary
        • R Shiny Example Web App
      • Python
        • Dash Example Web App
        • Distributed by Region (py)
        • Parallel by Chr (py)
        • Parallel by Region (py)
        • Pysam
        • TensorBoard Example Web App
      • Concurrent Computing Tutorials
        • Distributed
          • Distributed by Region (sh)
          • Distributed by Chr (sh)
          • Distributed by Region (py)
        • Parallel
          • Parallel by Chr (py)
          • Parallel by Region (py)
          • Parallel by Region (sh)
          • Parallel xargs by Chr
  • User
    • Login and Logout
    • Projects
      • Project Navigation
      • Path Resolution
    • Running Apps and Workflows
      • Running Apps and Applets
      • Running Workflows
      • Running Nextflow Pipelines
      • Running Batch Jobs
      • Monitoring Executions
      • Job Notifications
      • Job Lifecycle
      • Executions and Time Limits
      • Executions and Cost and Spending Limits
      • Smart Reuse (Job Reuse)
      • Apps and Workflows Glossary
      • Tools List
    • Cohort Browser
      • Chart Types
        • Row Chart
        • Histogram
        • Box Plot
        • List View
        • Grouped Box Plot
        • Stacked Row Chart
        • Scatter Plot
        • Kaplan-Meier Survival Curve
      • Locus Details Page
    • Using DXJupyterLab
      • DXJupyterLab Quickstart
      • Running DXJupyterLab
        • FreeSurfer in DXJupyterLab
      • Spark Cluster-Enabled DXJupyterLab
        • Exploring and Querying Datasets
      • Stata in DXJupyterLab
      • Running Older Versions of DXJupyterLab
      • DXJupyterLab Reference
    • Using Spark
      • Apollo Apps
      • Connect to Thrift
      • Example Applications
        • CSV Loader
        • SQL Runner
        • VCF Loader
      • VCF Preprocessing
    • Environment Variables
    • Objects
      • Describing Data Objects
      • Searching Data Objects
      • Visualizing Data
      • Filtering Objects and Jobs
      • Archiving Files
      • Relational Database Clusters
      • Symlinks
      • Uploading and Downloading Files
        • Small File Sets
          • dx upload
          • dx download
        • Batch
          • Upload Agent
          • Download Agent
    • Platform IDs
    • Organization Member Guide
    • Index of dx commands
  • Developer
    • Developing Portable Pipelines
      • dxCompiler
    • Cloud Workstation
    • Apps
      • Introduction to Building Apps
      • App Build Process
      • Advanced Applet Tutorial
      • Bash Apps
      • Python Apps
      • Spark Apps
        • Table Exporter
        • DX Spark Submit Utility
      • HTTPS Apps
        • Isolated Browsing for HTTPS Apps
      • Transitioning from Applets to Apps
      • Third Party and Community Apps
        • Community App Guidelines
        • Third Party App Style Guide
        • Third Party App Publishing Checklist
      • App Metadata
      • App Permissions
      • App Execution Environment
        • Connecting to Jobs
      • Dependency Management
        • Asset Build Process
        • Docker Images
        • Python package installation in Ubuntu 24.04 AEE
      • Job Identity Tokens for Access to Clouds and Third-Party Services
      • Enabling Web Application Users to Log In with DNAnexus Credentials
      • Types of Errors
    • Workflows
      • Importing Workflows
      • Introduction to Building Workflows
      • Building and Running Workflows
      • Workflow Build Process
      • Versioning and Publishing Global Workflows
      • Workflow Metadata
    • Ingesting Data
      • Molecular Expression Assay Loader
        • Common Errors
        • Example Usage
        • Example Input
      • Data Model Loader
        • Data Ingestion Key Steps
        • Ingestion Data Types
        • Data Files Used by the Data Model Loader
        • Troubleshooting
      • Dataset Extender
        • Using Dataset Extender
    • Dataset Management
      • Rebase Cohorts and Dashboards
      • Assay Dataset Merger
      • Clinical Dataset Merger
    • Apollo Datasets
      • Dataset Versions
      • Cohorts
    • Creating Custom Viewers
    • Client Libraries
      • Support for Python 3
    • Walkthroughs
      • Creating a Mixed Phenotypic Assay Dataset
      • Guide for Ingesting a Simple Four Table Dataset
    • DNAnexus API
      • Entity IDs
      • Protocols
      • Authentication
      • Regions
      • Nonces
      • Users
      • Organizations
      • OIDC Clients
      • Data Containers
        • Folders and Deletion
        • Cloning
        • Project API Methods
        • Project Permissions and Sharing
      • Data Object Lifecycle
        • Types
        • Object Details
        • Visibility
      • Data Object Metadata
        • Name
        • Properties
        • Tags
      • Data Object Classes
        • Records
        • Files
        • Databases
        • Drives
        • DBClusters
      • Running Analyses
        • I/O and Run Specifications
        • Instance Types
        • Job Input and Output
        • Applets and Entry Points
        • Apps
        • Workflows and Analyses
        • Global Workflows
        • Containers for Execution
      • Search
      • System Methods
      • Directory of API Methods
      • DNAnexus Service Limits
  • Administrator
    • Billing
    • Org Management
    • Single Sign-On
    • Audit Trail
    • Integrating with External Services
    • Portal Setup
    • GxP
      • Controlled Tool Access (allowed executables)
  • Science Corner
    • Scientific Guides
      • Somatic Small Variant and CNV Discovery Workflow Walkthrough
      • SAIGE GWAS Walkthrough
      • LocusZoom DNAnexus App
      • Human Reference Genomes
    • Using Hail to Analyze Genomic Data
    • Open-Source Tools by DNAnexus Scientists
    • Using IGV Locally with DNAnexus
  • Downloads
  • FAQs
    • EOL Documentation
      • Python 3 Support and Python 2 End of Life (EOL)
    • Automating Analysis Workflow
    • Backups of Customer Data
    • Developing Apps and Applets
    • Importing Data
    • Platform Uptime
    • Legal and Compliance
    • Sharing and Collaboration
    • Product Version Numbering
  • Release Notes
  • Technical Support
  • Legal
Powered by GitBook

Copyright 2025 DNAnexus

On this page
  • Before You Begin
  • Learning about Sambamba
  • Working on the DNAnexus Platform
  • Step 1. Create the Applet Directory and Subdirectories
  • Step 2. Create the dxapp.json File
  • Applet Name
  • Inputs and Outputs
  • Run Specs
  • Internet Access
  • Step 3. Package the Executable With the Applet
  • Step 4. Write the Applet Script
  • Script Setup
  • Configuring Inputs
  • Configuring Outputs
  • Setting Up the File Merge
  • Uploading Results with dx-upload-all-outputs
  • Step 5. Build and Run the Applet
  • Selecting BAM Files to Merge
  • Building and Running Your Applet
  • Learn More
  • Using dx-app-wizard
  • Language-Specific Tutorials
  • Getting Sample Code

Was this helpful?

Export as PDF
  1. Developer
  2. Apps

Advanced Applet Tutorial

Learn to use Sambamba to create advanced Bash applets for use on the DNAnexus Platform.

In this tutorial, you'll learn to create an advanced Bash applet. Your applet will use Sambamba, an open-source toolkit, to merge multiple BAM files into a single file.

Before You Begin

Learning about Sambamba

If you're not familiar with Sambamba, review the Sambamba documentation.

Working on the DNAnexus Platform

Download the DNAnexus SDK if you haven't already done so.

If you're not familiar with the dx command-line client, review the Command-Line Quickstart.

If this is your first time writing an app for use on the DNAnexus Platform, also review the Introduction to Building Apps.

Step 1. Create the Applet Directory and Subdirectories

When you create an applet for use on the Platform, you start by creating a local directory structure to hold your source code and other resources. For this tutorial, you'll need to create the following directory structure:

sambamba_merge_applet/
├── dxapp.json
├── Readme.md
├── resources/
│   └── usr/
│       └── bin/
│
└── src/
    └── script.sh

To do this, open a terminal app on your local machine, navigate to a directory where you want to build your applet, then enter the following commands:

mkdir -p sambamba_merge_applet/resources/usr/bin
mkdir sambamba_merge_applet/src

Step 2. Create the dxapp.json File

Next use a text editor to create a file called dxapp.json, and, as shown in the previous section, save it in the applet's root directory, sambamba_merge_applet.

The dxapp.json file is a DNAnexus application metadata file. Its presence in a directory tells DNAnexus tools that it contains DNAnexus applet source code. Its component fields contain information about the applet, and specifications for how it will be run.

The file's structure and content should be as follows:

{
  "name": "sambamba_merge_applet",
  "title": "Sambamba Mappings Merger",
  "summary": "Uses Sambamba to merge multiple sorted BAM files into a single BAM file",
  "version": "0.0.1",
  "inputSpec":
  [
    {
      "name": "sorted_bams",
      "label": "Sorted mappings",
      "help": "A set of coordinate-sorted BAM files to be merged.",
      "class": "array:file",
      "patterns": ["*.bam"]
    },

    {
      "name": "advanced_options",
      "label": "Advanced command line options",
      "help": "Advanced command line options that will be supplied directly to the Sambamba merge execution.",
      "class": "string",
      "optional": true
    }
  ],
  "outputSpec": [
    {
       "name": "merged_bam",
      "label": "Merged sorted mappings",
      "help": "A BAM file with the merged mappings.",
      "class": "file",
       "patterns": ["*.bam"]
    }
  ],
  "runSpec": {
    "interpreter": "bash" ,
    "file": "src/script.sh",
    "systemRequirements": {
      "*": {
        "instanceType": "mem2_ssd1_v2_x4"
      }
    },    
    "distribution": "Ubuntu",
    "release": "24.04",
    "execDepends": []
  },
  "openSource": true
}

Applet Name

The applet's name - sambamba_merge_applet - is specified in the name field.

Inputs and Outputs

The inputSpec field is an array containing two objects. Each provides specs for one of the two inputs taken by the applet:

  1. sorted_bams- This input consists of an array of BAM files. As indicated in the patterns subfield, each of these files must have a name that ends with the extension .bam.

  2. advanced_options - This optional input consists of a string of advanced command-line options to be passed to Sambamba, for use in merging the source BAM files. See the Sambamba documentation for more on these options. See below for details on how to include these options when your applet launches Sambamba.

The outputSpec field is also an array, containing a single object that provides specs for the applet's single output:

  1. merged_bam- A single BAM file, which, as indicated in the patterns subfield, must have a name that ends with the extension .bam.

Run Specs

The value "bash" in the runSpec field's interpreter subfield specifies that the applet is a Bash script.

The value "src/script.sh" in the runSpec field's file subfield specifies that the worker running the applet should run the executable script.sh, located in the applet's src subdirectory.

In the runSpec field'ssystemRequirements subfield, note the value "mem2_ssd1_v2_x4" in the instanceType field. This specifies that each of the applet's entry points should be run using the mem2_ssd1_v2_x4 instance type.

Internet Access

You can use the access key in your applet's dxapp.json file to configure its ability to access the internet. See the following documentation for more information:

  • Disabling internet access

  • Allowing full internet access

  • Restricting access to specific domains

Step 3. Package the Executable With the Applet

Download the Sambamba binary from the Sambamba releases page. Uncompress the executable and and place it in your applet's resources/usr/bin/ subdirectory.

Now run the following commands:

# Navigate to your applet root directory
cd /path/to/app/directory

# Untar the downloaded executable
tar -xzf /path/to/downloaded/sambamba_executable

# Rename and move the executable to the correct directory
# Note: if you don't rename the executable, make sure the
#       app source code uses the full name of the downloaded
#       sambamba executable.
mv sambamba_* resources/usr/bin/sambamba

Step 4. Write the Applet Script

The next step is to write the script that will be executed when the applet is launched.

Script Setup

Start by using a text editor to create a file named script.sh and save it in the applet's src subdirectory.

Add these two lines at the top of the file:

#!/bin/bash
set -e -x -o pipefail

The first line, sometimes called the shebang, specifies which interpreter should be used to parse the remainder of the file. In this case, it's the Bash interpreter.

The second line contains settings to be used in executing the script:

  • The -e flag ensures that execution will abort on an error.

  • The -o pipefail ensures that Bash will throw an error if it encounters an error within a pipeline.

  • The -x flag ensures that Bash will output each line as it is executed. This is useful for debugging.

Note that on the DNAnexus Platform, workers have the flag -e set by default. If you would like to keep your script to the end, regardless of any errors encountered during execution, replace -e with +e in the second line of the script.

Configuring Inputs

On the DNAnexus Platform, the execution of an applet on a worker runs in the directory defined by the constant $HOME. To be accessible to the execution, inputs need to be in the subdirectory $HOME/in. Add the following line to your script file, to have your input files automatically downloaded to $HOME/in:

dx-download-all-inputs

Note that you named your applet's first input sorted_bams, and defined it as an array of files. As such, when your input files are downloaded to $HOME/in, they will be placed in a subdirectory $HOME/in/sorted_bams. Each file will be placed in a separate subdirectory within $HOME/in/sorted_bams, with these subdirectories named with integers starting with 0, like the elements of an array.

For example, if you supply three files to the applet, named SRR100022_chrom20_mapped_to_b37.bam, SRR100022_chrom21_mapped_to_b37.bam, and SRR100022_chrom22_mapped_to_b37.bam, they will be downloaded to:

$HOME/in/sorted_bams/0/SRR100022_chrom20_mapped_to_b37.bam $HOME/in/sorted_bams/1/SRR100022_chrom21_mapped_to_b37.bam $HOME/in/sorted_bams/2/SRR100022_chrom22_mapped_to_b37.bam

The file structure for the inputs will then be as follows:

$HOME
├── in
│   └── sorted_bams
│       ├── 0
│       │   └── SRR100022_chrom20_mapped_to_b37.bam
│       ├── 1
│       │   └── SRR100022_chrom21_mapped_to_b37.bam
│       └── 2
│           └── SRR100022_chrom22_mapped_to_b37.bam
│ ...

Configuring Outputs

Creating an Output Directory

Next add the following line to your script, creating a directory $HOME/out/merged_bam, to store your output file:

mkdir -p out/merged_bam

Note that the directory's name, merged_bam, corresponds to the name of your applet's output parameter, as specific in the dxapp.json file. This ensures that your applet will treat files in this directory as outputs, and then automatically upload them after execution has finished.

Setting the Output Filename

On the DNAnexus Platform, environment variables can be used to set execution output filenames to follow a specific pattern. To leverage this feature, add the following line to your script:

output_name="${sorted_bams_prefix[0]}_merged.bam"

This line specifies that your output file will have a name with the following components:

  1. ${sorted_bams_prefix[0]} - This adds to the output filename the prefix of the first file used as an input, i.e. that file's name, with its extension or extensions removed. sorted_bams refers to the name of the array that contains your input files. _prefix and [0] specify the prefix of the file stored in the first element of that array. Wrapping this all in ${} ensures that it is parsed as code rather than as a string.

  2. _merged - This string adds an additional descriptive element to the filename, making clear it consists of multiple files merged into one.

  3. .bam - This extension defines the file as a .bam, or Binary Alignment Map, file.

Setting Up the File Merge

Next add a line to your script that will launch Sambamba and have it merge your input files into a single file:

sambamba merge $advanced_options "$output_name" "${sorted_bams_path[@]}"

To break this down:

  • sambamba merge invokes Sambamba's merge function.

  • $advanced_options are any advanced Sambamba command-line options you specified, when creating the applet's dxapp.json file.

  • "$output_name" gives the output file the name that, in the previous step, you stored in the variable output_name.

  • "${sorted_bams_path[@]}" specifies that Sambamba should use as inputs the files stored in the array sorted_bams. Adding _path enables your script to leverage a DNAnexus Platform environment variable to specify that Sambamba should look for this array in the directory $HOME/in/sorted_bams/

To ensure that ${sorted_bams_path[@]} enables your script to find your input files and provide them to Sambamba, you must use dx-download-all-inputs to download those files to $HOME/in, as per the instructions above.

When your script is run, Bash will automatically interpret the variables you've included in the merge command. So if, for example, you have three input files named NA12878.chr1.bam, NA12878.chr2.bam, and NA12878.chr3.bam Bash will interpret your code as follows:

sambamba merge \
  -o "$HOME/out/bam_output/part_0_merged.bam" \
  "$HOME/in/sorted_bams/0/NA12878.chr1.bam" \
  "$HOME/in/sorted_bams/1/NA12878.chr2.bam" \
  "$HOME/in/sorted_bams/2/NA12878.chr3.bam"

Setting Up the File Merge Without Using Environment Variables

You can use the following merge command in your shell script, if for some reason you don't want it to leverage DNAnexus Platform environment variables:

sambamba merge $advanced_options $output_folder/$output_name $HOME/in/sorted_bams/*/*

Uploading Results with dx-upload-all-outputs

After Sambamba merges your input files, the output file needs to be moved to the $HOME/out/merged_bam folder on the worker. To provide for this, add the following line to your script:

mv $output_name out/merged_bam/

Then your output file needs to be uploaded from the worker to the DNAnexus Platform. For this, use the utility dx-upload-all-outputs , which will automatically upload the contents of all subdirectories on the path $HOME/out/. To provide for this, add the following line to your script:

dx-upload-all-outputs

Step 5. Build and Run the Applet

You've completed your script. It should read as follows:

#!/bin/bash
set -e -x -o pipefail
dx-download-all-inputs
mkdir -p out/merged_bam
output_name="${sorted_bams_prefix[0]}_merged.bam"
sambamba merge $advanced_options "$output_name" "${sorted_bams_path[@]}"
mv "$output_name" out/merged_bam/
dx-upload-all-outputs

You're ready to build and run your applet.

If you haven't yet done so, log into the DNAnexus Platform using your terminal app. Then select the project in which you'd like to work.

Selecting BAM Files to Merge

If you have BAM files you'd like to merge using your applet, upload them to your project. If you prefer, you can use the BAM files available in the Demo Data public project.

If you choose to upload your own data, test your applet by doing an initial run that uses small files.

Building and Running Your Applet

Now use dx build to build your applet. Enter the following command

In your terminal app, enter the following commands to build and run your applet:

dx build path/to/app/directory
dx run sambamba_merge_applet

Learn More

Using dx-app-wizard

In this tutorial you manually created the applet local directory, dxapp.json, and shell script (src/script.sh). These steps can be automated, by using dx-app-wizard. Consult the Intro to Building Apps tutorial for guidance on using dx-app-wizard.

Note that dx-app-wizard does have certain limitations. It does not, for example, prompt you to provide advanced configuration settings, such as instanceType specs, and the patterns settings you added here to your applet's inputSpec and outputSpec definitions. In addition, it does not use either dx-download-all-inputs or dx-upload-all-outputs.

dx-app-wizard can still be useful as a tool to speed your work, even if you want to leverage the advanced functionality available when you manually perform all the steps involved in creating an applet. You can, for example, use dx-app-wizard to create the applet's directory structure and a basic dxapp.json file. Then use a text editor to, as needed, add additional fields to the dxapp.json file, and replace the Bash script created by dx-app-wizard, with your own script.

Language-Specific Tutorials

See the Developer Tutorials page for language-specific tutorials. Each provides guidance in crafting complex applets and apps, in a particular language.

Getting Sample Code

You can download the source code of a number of open-source apps available for use on the DNAnexus Platform, such as Cloud Workstation.

To get a list of these apps, use the following command:

dx api system findApps \
  '{"describe":{"fields":{"openSource": true, "name": true}}}'| \
  jq '.results|.[]|select(.describe.openSource)|.describe.name'

To download the source code of an open-source app available on the Platform, use dx get as follows, swapping in the app's name for app-cloud_workstation:

dx get app-cloud_workstation

Last updated 15 hours ago

Was this helpful?