# Third Party App Style Guide

This document provides guidelines for app and applet development. While this guide outlines best practices, we recommend using the following industry-standard style guides to ensure that code is clean and maintainable:

* Python - [PEP 8 – Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/) [Google Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md)
* Bash - [Google Shell Style Guide](https://google.github.io/styleguide/shell.xml) [`shellcheck`](https://github.com/koalaman/shellcheck) [`Shell-format`](https://github.com/mvdan/sh)

As always with any style guide if you have a reason or a different convention is followed in the code you are extending then it's okay, even recommended, to deviate from the style guide.

## `dxapp.json`

The [`dxapp.json` JSON file](https://documentation.dnanexus.com/developer/apps/app-metadata) establishes the convention that users of the DNAnexus Platform use to create app(let)s on the platform. When defining applications for wide use, this style guide sets standards for user-friendly UI, CLI, and runtime app(let) definitions.

### Name

Applet names should be all lowercase with words separated by underscores.

```json
"name": "hisat2_mapper"
```

### Summary

An app summary must be concise. It should fit on one line and not terminate in a period. The current app(let) is the assumed subject of the summary.

#### Good

```json
"summary": "Merges multiple BAM files into a single one"
```

```json
"summary": "Maps FASTQ reads (paired or unpaired) to a reference genome with the BWA-MEM algorithm"
```

#### Bad

```json
"summary": "This app takes multiple BAM file inputs. Then uses a SAMtools(v1.3.1) to merge and  output a single BAM file."
```

The subject *"This app"* is unnecessary. The sentence is too verbose in its explanation and should rely on the subject being assumed.

### App Execution Environment

Make sure the app runs in [Python3 App Execution Environment](https://documentation.dnanexus.com/faqs/eol-documentation/python-2-7-deprecation-and-migration-to-python-3) by setting `runSpec` accordingly. For example, to run an app in Ubuntu 24.04 Python3 environment specify:

```json
"runSpec": {"release": "24.04", "version": "0", "interpreter": "python3", ...}
```

and a Bash app in Ubuntu 24.04 Python3 environment:

```json
"runSpec": {"release": "24.04", "version": "0", "interpreter": "bash", ...}
```

### Licenses

Include licenses of the dependency software and packages installed in the `upstreamProjects` property of the `dxapp.json`. If there are additional hidden layers of dependencies from the ones you explicitly installed, it is the package author's responsibility to list the appropriate licenses.

The following keys are required to ensure compliance with open-source licenses: `name`, `repoUrl`, `version`, `license`, and `licenseUrl`, while `author` is optional but good to have.

{% hint style="success" %}
When reviewing licenses, check for any additional requirements or restrictions that may apply to your use case.
{% endhint %}

Consider the following requirements:

* `license`: See [SPDX standards](https://spdx.org/licenses/) for license abbreviations.
* `licenseUrl`
  * When the software is maintained on GitHub:
    1. Find the `LICENSE` or `COPYING` file link from the software version's tag/commit
    2. Use "permalink" for the static hyperlink pointing to the particular version applied. In your browser, [press "y" to get permanent link](https://docs.github.com/en/repositories/working-with-files/using-files/getting-permanent-links-to-files#press-y-to-permalink-to-a-file-in-a-specific-commit).
  * When the software is not maintained on GitHub:
    1. Find the license document from the software's website and provide URL pointing to the license document.
* `author`
  * Use authors from the relevant paper for the tool, if available.
  * If not, refer to the AUTHORS.md file (an organization name or person's name) if present in the software's repository.
  * If neither option is available, you may omit the author field.

```json
"details": {
  ...
  "upstreamProjects": [
    {
      "name": "BWA",
      "repoUrl": "https://github.com/lh3/bwa",
      "version": "0.7.15-r1140",
      "license": "GPL-3.0-or-later",
      "licenseUrl": "https://github.com/lh3/bwa/blob/08764215c6615ea52894e1ce9cd10d2a2faa37a6/COPYING",
      "author": "Heng Li"
    },
    {
      "name": "biobambam2",
      "repoUrl": "https://github.com/gt1/biobambam2",
      "version": "2.0.87-release-20180301132713",
      "license": "MIT, GPL-3.0-or-later",
      "licenseUrl": "https://github.com/gt1/biobambam2/blob/5798e74558e001e33855cb93cc8bf149344b931d/COPYING",
      "author": "German Tischler"
    },
    {
      "name": "GNU Gzip",
      "repoUrl": "https://www.gnu.org/software/gzip/",
      "version": "1.6",
      "license": "GPL-3.0-or-later",
      "licenseUrl": "https://www.gnu.org/licenses/gpl-3.0.html",
      "author": "Jean-loup Gailly"
    }
  ],
  ...
}
```

### Citations

Cite the publications that are associated with the software being used. Use a [DOI](https://www.doi.org/) name to refer to the paper.

```json
"details": {
  ...
  "citations": [
    "doi:10.1093/bioinformatics/btv098",
    "arXiv:1303.3997v2"  # As of 20190712, platform webUI does not resolve arXiv links, though it can be queried through CLI
  ]
  ...
}
```

### Categories (Apps)

Categories are great for filtering applets from the CLI using `dx find`. While an app can have many categories, a subset shows up in the web UI. You can assign *any* category to an app(let) but remember, **Categories searchable in the UI are defined by DNAnexus**. If you want to add/remove a category from the web UI, contact [DNAnexus Support](mailto:support@dnanexus.com).

### Help

Optional arguments should start with "(Optional)".

```json
{
  ...
  "optional": true,
  "help": "(Optional) Annotations file in Ensembl GTF format.",
  ...
}
```

### Specifying Inputs and Outputs (I/O Spec)

Treat the I/O spec of an app(let) like the docstring of a well-documented function. It should be descriptive and provide a good understanding without looking under the hood.

#### Input Variable Ordering

For inputs use the `group` field to dictate how options are shown. Groups are shown in order of first appearance in the input specification with the unnamed group always appearing first. Strive to sensibly group inputs.

#### Output Files

When possible, output index files along with the primary file. This includes pairs like BAM + BAI and VCF + TBI.

When outputting a reference index file from an app(let) script, keep the reference name in the generated index. For example:

`referencename.fa.gz` -> indexed by HISAT2 index -> `referencename.hisat2-index.tar.gz` output filename

#### Suggestions

For file type inputs you can [recommend](https://documentation.dnanexus.com/api/running-analyses/io-and-run-specifications#input-specification) projects containing inputs or specific input files for users. For reference genomes and indexes suggest the "DNAnexus Reference Genomes" project.

```json
{
  "name": "genomeindex_targz",
  "label": "BWA reference genome index",
  "help": "A file, in gzipped tar archive format, with the reference genome sequence already indexed with BWA.",
  "class": "file",
  "patterns": ["*.bwa-index.tar.gz"],
  "suggestions": [
    {
      "name": "DNAnexus Reference Genomes",
      "project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
      "path": "/"
    }
  ]
}
```

#### Name Specification

App(let) I/O `name` fields should follow the pattern:

`noun[index]_[adjective]_filetype`

For BAM files:

```shell
mappings_bam  # BAM files
mappings_sorted_bam  # Sorted BAM File
mappings_sorted_bai  # Sorted BAI File
mappings_readname_bam  # Readname Sorted Bam
```

For FASTQ files:

```shell
reads_fastqgz  # Forward gzipped reads
reads2_fastqgz  # Reverse gzipped reads
```

For VCF files:

```shell
variants_vcf  # Variants files
variants_vcfgz  # Gzipped variants files
variants_tbi  # Variants index file
```

For reference files:

```shell
reference_targz  # *.tar.gz reference file
reference.tool-index.tar.gz  # bioinformatics tool indexed reference
```

## Scripts

### General Guidelines

#### Local Variable Naming

When downloading files from the DNAnexus Platform, preserve the original filenames rather than using hardcoded constants. This practice ensures that error messages and log outputs reference filenames that users recognize from the Platform, making debugging and troubleshooting easier.

Using the actual platform filename also maintains clear traceability between input files and their local counterparts throughout your app's execution. For example:

Good (`src/code.sh`):

```shell
dx download "${sorted_bam}"
samtools view -H "${sorted_bam_name}"
```

Bad (`src/code.sh`):

```shell
dx download $sorted_bam -o sorted.bam
samtools view -H sorted.bam
```

Good (`src/code.py`):

```python
mapping_sorted_bam = dxpy.DXFile(sorted_bam)
sorted_bam_name = mapping_sorted_bam.name
dxpy.download_dxfile(mapping_sorted_bam.get_id(), sorted_bam_name)
subprocess.check_call('samtools view -H {bam_name}'.
format(bam_name=sorted_bam_name))
```

Bad (`src/code.py`):

```python
mapping_sorted_bam = dxpy.DXFile(sorted_bam)
dxpy.download_dxfile(mapping_sorted_bam.get_id(), "input.bam")
subprocess.check_call('samtools view -H input.bam')
```

#### Bash Specifics

Variables used in bash should always be enclosed in brackets and quotes to prevent globbing or word splitting unless intended. This is especially important when constructing file names and other values:

Good (`src/code.sh`):

```shell
prefix="SRR504516"
sortedbam_name="${prefix}_sorted.bam"
echo "${sortedbam_name}"
# outputs: SRR504516_sorted.bam
```

Bad (`src/code.sh`):

```shell
prefix="SRR504516"
sortedbam_name=$prefix_sorted.bam
echo $sortedbam_name
# outputs: .bam
# uses var prefix_sorted, which doesn't exist
```

### References

References should have filenames which are descriptive of what is in them. This includes references which have been indexed for specific uses. Multiple methods are available for handling references.

### Script Section Commenting

```shell
#
# Section overview
# ---------------------------------------------------------------
# Summary and additional notes on the section complete sentences.
#
# Sentences/ideas can be separated by newlines if needed.
#
# Remember that a style guide is a suggestion. As long as you're consistent
# with whatever section/block commenting pattern you use, you're golden.
#
```

## Functions

### Function/Method Naming

`descriptive_lowercase_function_name_separated_by_underscores()`

Functions, like variables, should be all lowercase, with name parts separated by underscores. Names should describe the task being performed in the function body.

Good (`src/code.sh`):

```shell
function split_bam_by_chr () {
    echo bam filename: "$1"
    echo chromosome: "$2"
    samtools view -b "$1" "$2" -o "${1%.bam}"_chr"$2".bam;
}
```

Good (`src/code.py`):

```python
def split_bam_by_chr(bam_file, chromosome):
    split_cmd = "samtools view -b {bam} {chr}".format(bam=bam_file,
                   chr=chromosome)
    subprocess.check_call(split_cmd, shell=true)
```

### Descriptions (Doc Strings)

#### Function Descriptions

In general, try to keep functions straightforward and understandable. Comments should be used for: long functions, complex algorithms, or a series of difficult to read shell commands. Descriptions should be included as a block comment in bash or a docstring in Python (for Python follow [PEP 257](https://www.python.org/dev/peps/pep-0257/)). When needed, include an overview of function Arguments, Returns, and Exceptions (Raises) in the docstring/comment.

Good (`src/code.sh`):

```shell
#####################################################
# Split BAM by specified chromosome.
#
#  Globals:
#      VIEW_OPT: predefined view cmd options
#  Arguments:
#      $1: bam filename
#      $2: chromosome region to use
#  Returns:
#      name of the generated BAM file
#####################################################
function split_bam_by_chr() {
    echo bam filename: "$1"
    echo chromosome: "$2"
    split_bam_name="${1%.bam}"_chr"$2".bam
    samtools view -b "$VIEW_OPT" "$1" "$2" -o "$split_bam_name"
    echo "$split_bam_name"
}
```

Good (`src/code.py`):

```python
def split_bam_by_chr(bam_filename, chromosome):
    """Create bam file from a specified chromosome.

    Notes:
          doc strings follow Google best practices. Again a style is just
          a suggestion, the most important thing is... Be consistent!

          A side note worth mentioning specifically for python,
          Following style guides for commenting allows for auto-generating
          code libraries like sphinx to parse and compile autodocs.

    Args:
          bam_file (str): bam filename.
          chromosome (str): Chromosome region to split into its own BAM.

    Returns:
       None

    Raises:
          CalledProcessError: If subprocess.check_call() fails
    """
    split_bam_name="${bam}_chr{chr}.bam".format(
      bam=bam_filename, chr=chromosome)
    split_cmd = "samtools view -b {bam} {chr} -o {outbam}"
        .format(bam=bam_filename,
        chr=chromosome,
        outbam=split_bam_name)
    subprocess.check_call(split_cmd, shell=true)
```

## Supplementary Information

### Building Commands

In App(let)s commands that are executed (via subprocess in Python) are constructed in different ways based on user input. Incorrect command construction can lead to unexpected failures and results due to word splitting and globbing. In Python use string.format() to build commands and remember to escape special characters and in Bash use arrays to construct commands.

Good (`src/code.sh`):

Use arrays and correct quoting to build commands

```shell
options=()
options+=("view")
options+=("-c")
options+=("bam with space.bam") # Quotes prevent unwanted word splits

samtools "${options[@]}" # Quotes prevent re-splitting of elements
```

Good (`src/code.py`):

```python
cmd = "samtools view {options} \"{bam_file}\"".format(options="-c", bam_file="bam with space.bam")
# The escaped quotes prevent word splitting in the subshell
```
