Third Party App Style Guide
This document provides guidelines for app and applet development. While we outline best practices in this guide, we recommend using the following industry-standard style guides, to ensure that code is clean and maintainable:
As always with any style guide if you have a reason or a different convention is followed in the code you are extending then it's okay, even recommended, to deviate from the style guide.
dxapp.json
dxapp.json
The dxapp.json
JSON file establishes the convention that users of the DNAnexus Platform use to create app(let)s on the platform. When defining applications for wide use, this style guide sets standards for user-friendly UI, CLI, and runtime app(let) definitions.
Name
Applet names should be all lowercase with words separated by underscores.
"name": "hisat2_mapper"
Summary
An app summary must be concise. It should fit on one line and not terminate in a period. The current app(let) is the assumed subject of the summary.
Good
"summary": "Merges multiple BAM files into a single one"
"summary": "Maps FASTQ reads (paired or unpaired) to a reference genome with the BWA-MEM algorithm"
Bad
"summary": "This app takes multiple BAM file inputs. Then uses a SAMtools(v1.3.1) to merge and output a single BAM file."
The subject "This app" is unnecessary. The sentence is too verbose in its explanation and should rely on the subject being assumed.
App Execution Environment
Make sure the app runs in Python3 App Execution Environment by setting runSpec
accordingly. For example, to run an app in Ubuntu 24.04 Python3 environment specify:
"runSpec": {"release": "24.04", "version": "0", "interpreter": "python3", ...}
and a Bash app in Ubuntu 24.04 Python3 environment:
"runSpec": {"release": "24.04", "version": "0", "interpreter": "bash", ...}
Licenses
Include licenses of the dependency software and packages installed in the upstreamProjects
property of the dxapp.json
. If there are additional hidden layers of dependencies from the ones you explicitly installed, it is the package author's responsibility to list the appropriate licenses.
The following keys are required to ensure compliance with open-source licenses: name
, repoUrl
, version
, license
, and licenseUrl
, while author
is optional but good to have.
Take note of the following:
license
: See SPDX standards for license abbreviations.licenseUrl
When the software is maintained on GitHub:
Find the
LICENSE
orCOPYING
file link from the software version's tag/commitUse "permalink" for the static hyperlink pointing to the particular version applied. Press "y" at the browser for permanent link: getting-permanent-links-to-files
When the software is not maintained on GitHub:
Find the license document from the software's website and provide URL pointing to the license document.
author
First use authors from the relevant paper for the tool
Second use AUTHORS.md file (an org name or person's name) if present at the software's repository
Third if no options, then provide no author
"details": {
...
"upstreamProjects": [
{
"name": "BWA",
"repoUrl": "https://github.com/lh3/bwa",
"version": "0.7.15-r1140",
"license": "GPL-3.0-or-later",
"licenseUrl": "https://github.com/lh3/bwa/blob/08764215c6615ea52894e1ce9cd10d2a2faa37a6/COPYING",
"author": "Heng Li"
},
{
"name": "biobambam2",
"repoUrl": "https://github.com/gt1/biobambam2",
"version": "2.0.87-release-20180301132713",
"license": "MIT, GPL-3.0-or-later",
"licenseUrl": "https://github.com/gt1/biobambam2/blob/5798e74558e001e33855cb93cc8bf149344b931d/COPYING",
"author": "German Tischler"
},
{
"name": "GNU Gzip",
"repoUrl": "https://www.gnu.org/software/gzip/",
"version": "1.6",
"license": "GPL-3.0-or-later",
"licenseUrl": "https://www.gnu.org/licenses/gpl-3.0.html",
"author": "Jean-loup Gailly"
}
],
...
}
Citations
Cite the publications that are associated with the software being used. Use a DOI name to refer to the paper.
"details": {
...
"citations": [
"doi:10.1093/bioinformatics/btv098",
"arXiv:1303.3997v2" # As of 20190712, platform webUI does not resolve arXiv links, though it can be queried through CLI
]
...
}
Categories (Apps)
Categories are great for filtering applets from the CLI using dx find
. While an app can have many categories, a subset will show up in the web UI. You can assign any category to an app(let) but remember, Categories searchable in the UI are defined by DNAnexus. If you want to add/remove a category from the web UI, contact DNAnexus Support.
Help
Optional arguments should start with "(Optional)".
{
...
"optional": true,
"help": "(Optional) Annotations file in Ensembl GTF format.",
...
}
Specifying Inputs and Outputs (I/O Spec)
Treat the I/O spec of an app(let) like the docstring of a well-documented function. It should be descriptive and provide a good understanding without looking under the hood.
Input Variable Ordering
For inputs use the group
field to dictate how options will be shown. Groups will be shown in order of first appearance in the input specification with the unnamed group always appearing first. Strive to sensibly group inputs.
Output Files
When possible, output index files along with the primary file. This includes pairs like BAM + BAI and VCF + TBI.
When outputting a reference index file from an app(let) script, keep the reference name in the generated index. For example:
referencename.fa.gz
-> indexed by HISAT2 index -> referencename.hisat2-index.tar.gz
output filename
Suggestions
For file type inputs you can recommend projects containing inputs or specific input files for users. For reference genomes and indexes suggest the "DNAnexus Reference Genomes" project.
{
"name": "genomeindex_targz",
"label": "BWA reference genome index",
"help": "A file, in gzipped tar archive format, with the reference genome sequence already indexed with BWA.",
"class": "file",
"patterns": ["*.bwa-index.tar.gz"],
"suggestions": [
{
"name": "DNAnexus Reference Genomes",
"project": "project-BQpp3Y804Y0xbyG4GJPQ01xv",
"path": "/"
}
]
}
Name Specification
App(let) I/O name
fields should follow the pattern:
noun[index]_[adjective]_filetype
For BAM files:
mappings_bam # BAM files
mappings_sorted_bam # Sorted BAM File
mappings_sorted_bai # Sorted BAI File
mappings_readname_bam # Readname Sorted Bam
For FASTQ files:
reads_fastqgz # Forward gzipped reads
reads2_fastqgz # Reverse gzipped reads
For VCF files:
variants_vcf # Variants files
variants_vcfgz # Gzipped variants files
variants_tbi # Variants index file
For reference files:
reference_targz # *.tar.gz reference file
reference.tool-index.tar.gz # bioinformatics tool indexed reference
Scripts
General Guidelines
Local Variable Naming
Efforts should be made to download files as their filename on the platform, not as a constant. Any errors that occur due to the input will contain filenames familiar to users. For example:
Good (src/code.sh
):
dx download "${sorted_bam}"
samtools view -H "${sorted_bam_name}"
Bad (src/code.sh
):
dx download $sorted_bam -o sorted.bam
samtools view -H sorted.bam
Good (src/code.py
):
mapping_sorted_bam = dxpy.DXFile(sorted_bam)
sorted_bam_name = mapping_sorted_bam.name
dxpy.download_dxfile(mapping_sorted_bam.get_id(), sorted_bam_name)
subprocess.check_call('samtools view -H {bam_name}'.
format(bam_name=sorted_bam_name))
Bad (src/code.py
):
mapping_sorted_bam = dxpy.DXFile(sorted_bam)
dxpy.download_dxfile(mapping_sorted_bam.get_id(), "input.bam")
subprocess.check_call('samtools view -H input.bam')
Bash Specifics
Variables used in bash should always be enclosed in brackets and quotes to prevent globbing or word splitting unless intended. This is especially important when constructing file names and other values:
Good (src/code.sh
):
prefix="SRR504516"
sortedbam_name="${prefix}_sorted.bam"
echo "${sortedbam_name}"
# outputs: SRR504516_sorted.bam
Bad (src/code.sh
):
prefix="SRR504516"
sortedbam_name=$prefix_sorted.bam
echo $sortedbam_name
# outputs: .bam
# uses var prefix_sorted, which doesn't exist
References
References should have filenames which are descriptive of what is in them. This includes references which have been indexed for specific uses. We have multiple ways of how references are handled.
Script Section Commenting
#
# Section overview
# ---------------------------------------------------------------
# Summary and additional notes on the section complete sentences.
#
# Sentences/ideas can be separated by newlines if needed.
#
# Remember a style guide is JUST a suggestion, as long as you're consistent
# With whatever section/block commenting pattern you use, you're golden.
#
Functions
Function/Method Naming
descriptive_lowercase_function_name_separated_by_underscores()
Functions, like variables, should be all lowercase, with name parts separated by underscores. Names should describe the task being performed in the function body.
Good (src/code.sh
):
function split_bam_by_chr () {
echo bam filename: "$1"
echo chromosome: "$2"
samtools view -b "$1" "$2" -o "${1%.bam}"_chr"$2".bam;
}
Good (src/code.py
):
def split_bam_by_chr(bam_file, chromosome):
split_cmd = "samtools view -b {bam} {chr}".format(bam=bam_file,
chr=chromosome)
subprocess.check_call(split_cmd, shell=true)
Descriptions (Doc Strings)
Function Descriptions
In general, try to keep functions simple and easy to understand. Comments should be used for: long functions, complex algorithms, or a series of difficult to read shell commands. Descriptions should be included as a block comment in bash or a docstring in Python (for Python follow PEP 257). When needed, include an overview of function Arguments, Returns, and Exceptions (Raises) in the docstring/comment.
Good (src/code.sh
):
#####################################################
# Split BAM by specified chromosome.
#
# Globals:
# VIEW_OPT: predefined view cmd options
# Arguments:
# $1: bam filename
# $2: chromosome region to use
# Returns:
# name of the generated BAM file
#####################################################
function split_bam_by_chr() {
echo bam filename: "$1"
echo chromosome: "$2"
split_bam_name="${1%.bam}"_chr"$2".bam
samtools view -b "$VIEW_OPT" "$1" "$2" -o "$split_bam_name"
echo "$split_bam_name"
}
Good (src/code.py
):
def split_bam_by_chr(bam_filename, chromosome):
"""Create bam file from a specified chromosome.
Notes:
doc strings follow Google best practices. Again a style is just
a suggestion, the most important thing is... Be consistent!
A side note worth mentioning specifically for python,
Following style guides for commenting allows for auto-generating
code libraries like sphinx to parse and compile autodocs.
Args:
bam_file (str): bam filename.
chromosome (str): Chromosome region to split into its own BAM.
Returns:
None
Raises:
CalledProcessError: If subprocess.check_call() fails
"""
split_bam_name="${bam}_chr{chr}.bam".format(
bam=bam_filename, chr=chromosome)
split_cmd = "samtools view -b {bam} {chr} -o {outbam}"
.format(bam=bam_filename,
chr=chromosome,
outbam=split_bam_name)
subprocess.check_call(split_cmd, shell=true)
Supplementary Information
Building Commands
In App(let)s commands that are executed (via subprocess in Python) are constructed in different ways based on user input. Incorrect command construction can lead to unexpected failures and results due to word splitting and globbing. In Python use string.format() to build commands and remember to escape special characters and in Bash use arrays to construct commands.
Good (src/code.sh
):
Use arrays and correct quoting to build commands
options=()
options+=("view")
options+=("-c")
options+=("bam with space.bam") # Quotes prevent unwanted word splits
samtools "${options[@]}" # Quotes prevent re-splitting of elements
Good (src/code.py
):
cmd = "samtools view {options} \"{bam_file}\"".format(options="-c", bam_file="bam with space.bam")
# The escaped quotes prevent word splitting in the subshell
Last updated
Was this helpful?