# Advanced Applet Tutorial

In this tutorial, you learn to create an advanced Bash applet. Your applet uses Sambamba, an open-source toolkit, to merge multiple BAM files into a single file.

## Before You Begin

### Learning about Sambamba

If you're not familiar with Sambamba, review the [Sambamba documentation](https://lomereiter.github.io/sambamba/index.html).

### Working on the DNAnexus Platform

Download the [DNAnexus SDK](https://documentation.dnanexus.com/downloads) if you haven't already done so.

If you're not familiar with the `dx` command-line client, review the [Command-Line Quickstart](https://documentation.dnanexus.com/getting-started/cli-quickstart).

If this is your first time writing an app for use on the DNAnexus Platform, also review the [Introduction to Building Apps](https://documentation.dnanexus.com/developer/apps/intro-to-building-apps).

## Step 1. Create the Applet Directory and Subdirectories

When you create an applet for use on the Platform, you start by creating a local directory structure to hold your source code and other resources. For this tutorial, you need to create the following directory structure:

```
sambamba_merge_applet/
├── dxapp.json
├── Readme.md
├── resources/
│   └── usr/
│       └── bin/
│
└── src/
    └── script.sh
```

To do this, open a terminal app on your local machine, navigate to a directory where you want to build your applet, then enter the following commands:

```shell
mkdir -p sambamba_merge_applet/resources/usr/bin
mkdir sambamba_merge_applet/src
```

## Step 2. Create the `dxapp.json` File

Next use a text editor to create a file called `dxapp.json`, and, as shown in the previous section, save it in the applet's root directory, `sambamba_merge_applet`.

The `dxapp.json` file is a DNAnexus application metadata file. Its presence in a directory tells DNAnexus tools that it contains DNAnexus applet source code. Its component fields contain information about the applet, and specifications for how it runs.

The file's structure and content should be as follows:

```json
{
  "name": "sambamba_merge_applet",
  "title": "Sambamba Mappings Merger",
  "summary": "Uses Sambamba to merge multiple sorted BAM files into a single BAM file",
  "version": "0.0.1",
  "inputSpec":
  [
    {
      "name": "sorted_bams",
      "label": "Sorted mappings",
      "help": "A set of coordinate-sorted BAM files to be merged.",
      "class": "array:file",
      "patterns": ["*.bam"]
    },

    {
      "name": "advanced_options",
      "label": "Advanced command line options",
      "help": "Advanced command line options that will be supplied directly to the Sambamba merge execution.",
      "class": "string",
      "optional": true
    }
  ],
  "outputSpec": [
    {
       "name": "merged_bam",
      "label": "Merged sorted mappings",
      "help": "A BAM file with the merged mappings.",
      "class": "file",
       "patterns": ["*.bam"]
    }
  ],
  "runSpec": {
    "interpreter": "bash" ,
    "file": "src/script.sh",
    "systemRequirements": {
      "*": {
        "instanceType": "mem2_ssd1_v2_x4"
      }
    },
    "distribution": "Ubuntu",
    "release": "24.04",
    "execDepends": []
  },
  "openSource": true
}
```

### Applet Name

The applet's name - `sambamba_merge_applet` - is specified in the `name` field.

### Inputs and Outputs

The `inputSpec` field is an array containing two objects. Each provides specs for one of the two inputs taken by the applet:

1. `sorted_bams` - This input consists of an array of BAM files. As indicated in the `patterns` subfield, each of these files must have a name that ends with the extension `.bam`.
2. `advanced_options` - This optional input consists of a string of advanced command-line options to be passed to Sambamba, for use in merging the source BAM files. See [the Sambamba documentation for more on these options.](https://lomereiter.github.io/sambamba/docs/sambamba-merge.html) [See below for details on how to include these options](#setting-up-the-file-merge) when your applet launches Sambamba.

The `outputSpec` field is also an array, containing a single object that provides specs for the applet's single output:

1. `merged_bam` - A single BAM file, which, as indicated in the `patterns` subfield, must have a name that ends with the extension `.bam`.

### Run Specs

The value "bash" in the `runSpec` field's `interpreter` subfield specifies that the applet is a Bash script.

The value "src/script.sh" in the `runSpec` field's `file` subfield specifies that the worker running the applet should run the executable `script.sh`, located in the applet's `src` subdirectory.

In the `runSpec` field's `systemRequirements` subfield, the value "mem2\_ssd1\_v2\_x4" in the `instanceType` field specifies that each of the applet's [entry points](https://documentation.dnanexus.com/developer/api/running-analyses/applets-and-entry-points) should be run using the mem2\_ssd1\_v2\_x4 [instance type](https://documentation.dnanexus.com/developer/api/running-analyses/instance-types).

### Internet Access

You can use the `access` key in your applet's `dxapp.json` file to configure its ability to access the internet. See the following documentation for more information:

* [Disabling internet access](https://documentation.dnanexus.com/developer/execution-environment#network-access)
* [Allowing full internet access](https://documentation.dnanexus.com/faqs/developing-apps-and-applets#how-do-i-request-network-access-for-my-app)
* [Restricting access to specific domains](https://documentation.dnanexus.com/faqs/developing-apps-and-applets#how-do-i-request-network-access-for-my-app)

## Step 3. Package the Executable With the Applet

Download the Sambamba binary from the [Sambamba releases page](https://github.com/lomereiter/sambamba/releases). Decompress the executable and place it in your applet's `resources/usr/bin/` subdirectory.

Run the following commands:

```shell
# Navigate to your applet root directory
cd /path/to/app/directory

# Untar the downloaded executable
tar -xzf /path/to/downloaded/sambamba_executable

# Rename and move the executable to the correct directory
# Note: if you don't rename the executable, make sure the
#       app source code uses the full name of the downloaded
#       sambamba executable.
mv sambamba_* resources/usr/bin/sambamba
```

## Step 4. Write the Applet Script

The next step is to write the script that is executed when the applet is launched.

### Script Setup

Start by using a text editor to create a file named `script.sh` and save it in the applet's `src` subdirectory.

Add these two lines at the top of the file:

```shell
#!/bin/bash
set -e -x -o pipefail
```

The first line, sometimes called the [shebang](https://en.wikipedia.org/wiki/Shebang_\(Unix\)), specifies which interpreter should be used to parse the rest of the file. In this case, it's the Bash interpreter.

The second line contains settings to be used in executing the script:

* The `-e` flag ensures that execution aborts on an error.
* The `-o pipefail` ensures that Bash throws an error if it encounters an error within a pipeline.
* The `-x` flag ensures that Bash outputs each line as it is executed. This is useful for debugging.

{% hint style="info" %}
On the DNAnexus Platform, workers have the flag `-e` set by default. If you would like your script to continue to the end, regardless of any errors encountered during execution, replace `-e` with `+e` in the second line of the script.
{% endhint %}

### Configuring Inputs

On the DNAnexus Platform, the execution of an applet on a worker runs in the directory defined by the constant `$HOME`. To be accessible to the execution, inputs need to be in the subdirectory `$HOME/in`. Add the following line to your script file, to have your input files automatically downloaded to `$HOME/in`:

```shell
dx-download-all-inputs
```

The applet's first input is named `sorted_bams` and defined as an array of files. When your input files are downloaded to `$HOME/in`, they are placed in a subdirectory `$HOME/in/sorted_bams`. Each file is placed in a separate subdirectory within `$HOME/in/sorted_bams`, with these subdirectories named with integers starting with 0, like the elements of an array.

For example, if you supply three files to the applet, named `SRR100022_chrom20_mapped_to_b37.bam`, `SRR100022_chrom21_mapped_to_b37.bam`, and `SRR100022_chrom22_mapped_to_b37.bam`, they are downloaded to:

`$HOME/in/sorted_bams/0/SRR100022_chrom20_mapped_to_b37.bam` `$HOME/in/sorted_bams/1/SRR100022_chrom21_mapped_to_b37.bam` `$HOME/in/sorted_bams/2/SRR100022_chrom22_mapped_to_b37.bam`

The file structure for the inputs is then as follows:

```
$HOME
├── in
│   └── sorted_bams
│       ├── 0
│       │   └── SRR100022_chrom20_mapped_to_b37.bam
│       ├── 1
│       │   └── SRR100022_chrom21_mapped_to_b37.bam
│       └── 2
│           └── SRR100022_chrom22_mapped_to_b37.bam
│ ...
```

### Configuring Outputs

#### Creating an Output Directory

Next add the following line to your script, creating a directory `$HOME/out/merged_bam`, to store your output file:

```shell
mkdir -p out/merged_bam
```

The directory's name, `merged_bam`, corresponds to the name of your applet's output parameter, as specified in the `dxapp.json` file. This ensures that your applet treats files in this directory as outputs, and then automatically uploads them after execution has finished.

#### Setting the Output Filename

On the DNAnexus Platform, [environment variables](https://documentation.dnanexus.com/developer/bash#downloading-and-using-file-inputs) can be used to set execution output filenames to follow a specific pattern. To leverage this feature, add the following line to your script:

```shell
output_name="${sorted_bams_prefix[0]}_merged.bam"
```

This line specifies that your output file has a name with the following components:

1. `${sorted_bams_prefix[0]}` - This adds to the output filename the prefix of the first file used as an input, that is, that file's name, with its extension or extensions removed. `sorted_bams` refers to the name of the array that contains your input files. `_prefix` and `[0]` specify the prefix of the file stored in the first element of that array. Wrapping this all in `${}` ensures that it is parsed as code rather than as a string.
2. `_merged` - This string adds an additional descriptive element to the filename, making clear it consists of multiple files merged into one.
3. `.bam` - This extension defines the file as a `.bam`, or Binary Alignment Map, file.

### Setting Up the File Merge

Next add a line to your script that launches Sambamba and have it merge your input files into a single file:

```shell
sambamba merge $advanced_options "$output_name" "${sorted_bams_path[@]}"
```

To break this down:

* `sambamba merge` invokes the Sambamba merge function.
* `$advanced_options` are any advanced Sambamba command-line options you specified, when [creating the applet's `dxapp.json` file.](#step-2-create-the-dxappjson-file)
* `"$output_name"` gives the output file the name that, in the previous step, you stored in the variable `output_name`.
* `"${sorted_bams_path[@]}"` specifies that Sambamba should use as inputs the files stored in the array `sorted_bams`. Adding `_path` enables your script to leverage a [DNAnexus Platform environment variable](https://documentation.dnanexus.com/developer/bash#downloading-and-using-file-inputs) to specify that Sambamba should look for this array in the directory `$HOME/in/sorted_bams/`

{% hint style="info" %}
To ensure that `${sorted_bams_path[@]}` enables your script to find your input files and provide them to Sambamba, you must use `dx-download-all-inputs` to download those files to `$HOME/in`, per [the instructions above](#configuring-inputs).
{% endhint %}

When your script is run, Bash automatically interprets the variables you've included in the `merge` command. For example, if you have three input files named `NA12878.chr1.bam`, `NA12878.chr2.bam`, and `NA12878.chr3.bam` Bash interprets your code as follows:

```shell
sambamba merge \
  -o "$HOME/out/bam_output/part_0_merged.bam" \
  "$HOME/in/sorted_bams/0/NA12878.chr1.bam" \
  "$HOME/in/sorted_bams/1/NA12878.chr2.bam" \
  "$HOME/in/sorted_bams/2/NA12878.chr3.bam"
```

#### Setting Up the File Merge Without Using Environment Variables

You can use the following merge command in your shell script, if for some reason you don't want it to leverage DNAnexus Platform environment variables:

```shell
sambamba merge $advanced_options $output_folder/$output_name $HOME/in/sorted_bams/*/*
```

### Uploading Results with dx-upload-all-outputs

After Sambamba merges your input files, the output file needs to be moved to the `$HOME/out/merged_bam` folder on the worker. To provide for this, add the following line to your script:

```shell
mv $output_name out/merged_bam/
```

Then your output file needs to be uploaded from the worker to the DNAnexus Platform. For this, use the utility [`dx-upload-all-outputs`](https://documentation.dnanexus.com/user/helpstrings-of-sdk-command-line-utilities#dx-upload-all-outputs) , which automatically uploads the contents of all subdirectories on the path `$HOME/out/`. To provide for this, add the following line to your script:

```shell
dx-upload-all-outputs
```

## Step 5. Build and Run the Applet

You've completed your script. It should read as follows:

```shell
#!/bin/bash
set -e -x -o pipefail
dx-download-all-inputs
mkdir -p out/merged_bam
output_name="${sorted_bams_prefix[0]}_merged.bam"
sambamba merge $advanced_options "$output_name" "${sorted_bams_path[@]}"
mv "$output_name" out/merged_bam/
dx-upload-all-outputs
```

You're ready to build and run your applet.

If you haven't yet done so, log into the DNAnexus Platform using your terminal app. Then select the project in which you'd like to work.

### Selecting BAM Files to Merge

If you have BAM files you'd like to merge using your applet, [upload them to your project](https://documentation.dnanexus.com/user/objects/uploading-and-downloading-files/small-sets-of-files/uploading-using-dx). If you prefer, you can use the BAM files available in the [Demo Data public project](https://platform.dnanexus.com/projects/BQbJpBj0bvygyQxgQ1800Jkk/data/Developer%20Quickstart).

If you choose to upload your own data, test your applet by doing an initial run that uses small files.

### Building and Running Your Applet

In your terminal app, enter the following commands to build and run your applet:

```shell
dx build path/to/app/directory
dx run sambamba_merge_applet
```

## Learn More

### Using dx-app-wizard

In this tutorial you manually created the applet local directory, `dxapp.json`, and shell script (`src/script.sh`). These steps can be automated, by using `dx-app-wizard`. Consult the [Intro to Building Apps tutorial](https://documentation.dnanexus.com/developer/apps/intro-to-building-apps) for guidance on using `dx-app-wizard`.

The `dx-app-wizard` does have certain limitations. For example, it does not prompt you to provide advanced configuration settings, such as `instanceType` specs, or the `patterns` settings you added here to your applet's `inputSpec` and `outputSpec` definitions. The wizard also omits the use of utilities like `dx-download-all-inputs` or `dx-upload-all-outputs`.

`dx-app-wizard` can still be useful as a tool to speed your work, even if you want to leverage the advanced functionality available when you manually perform all the steps involved in creating an applet. You can, for example, use `dx-app-wizard` to create the applet's directory structure and a basic `dxapp.json` file. Then use a text editor to, as needed, add additional fields to the `dxapp.json` file, and replace the Bash script created by `dx-app-wizard`, with your own script.

### Language-Specific Tutorials

See the [Developer Tutorials](https://documentation.dnanexus.com/getting-started/developer-tutorials) page for language-specific tutorials. Each provides guidance in crafting complex applets and apps, in a particular language.

### Getting Sample Code

You can download the source code of open-source apps available for use on the DNAnexus Platform, such as [Cloud Workstation](https://documentation.dnanexus.com/developer/cloud-workstation).

To get a list of these apps, use the following command:

```shell
dx api system findApps \
  '{"describe":{"fields":{"openSource": true, "name": true}}}'| \
  jq '.results|.[]|select(.describe.openSource)|.describe.name'
```

To download the source code of an open-source app available on the Platform, use `dx get` as follows, swapping in the app's name for `app-cloud_workstation`:

```shell
dx get app-cloud_workstation
```
