DNAnexus Documentation
APIDownloadsIndex of dx CommandsLegal
  • Overview
  • Getting Started
    • DNAnexus Essentials
    • Key Concepts
      • Projects
      • Organizations
      • Apps and Workflows
    • User Interface Quickstart
    • Command Line Quickstart
    • Developer Quickstart
    • Developer Tutorials
      • Bash
        • Bash Helpers
        • Distributed by Chr (sh)
        • Distributed by Region (sh)
        • SAMtools count
        • TensorBoard Example Web App
        • Git Dependency
        • Mkfifo and dx cat
        • Parallel by Region (sh)
        • Parallel xargs by Chr
        • Precompiled Binary
        • R Shiny Example Web App
      • Python
        • Dash Example Web App
        • Distributed by Region (py)
        • Parallel by Chr (py)
        • Parallel by Region (py)
        • Pysam
      • Web App(let) Tutorials
        • Dash Example Web App
        • TensorBoard Example Web App
      • Concurrent Computing Tutorials
        • Distributed
          • Distributed by Region (sh)
          • Distributed by Chr (sh)
          • Distributed by Region (py)
        • Parallel
          • Parallel by Chr (py)
          • Parallel by Region (py)
          • Parallel by Region (sh)
          • Parallel xargs by Chr
  • User
    • Login and Logout
    • Projects
      • Project Navigation
      • Path Resolution
    • Running Apps and Workflows
      • Running Apps and Applets
      • Running Workflows
      • Running Nextflow Pipelines
      • Running Batch Jobs
      • Monitoring Executions
      • Job Notifications
      • Job Lifecycle
      • Executions and Time Limits
      • Executions and Cost and Spending Limits
      • Smart Reuse (Job Reuse)
      • Apps and Workflows Glossary
      • Tools List
    • Cohort Browser
      • Chart Types
        • Row Chart
        • Histogram
        • Box Plot
        • List View
        • Grouped Box Plot
        • Stacked Row Chart
        • Scatter Plot
        • Kaplan-Meier Survival Curve
      • Locus Details Page
    • Using DXJupyterLab
      • DXJupyterLab Quickstart
      • Running DXJupyterLab
        • FreeSurfer in DXJupyterLab
      • Spark Cluster-Enabled DXJupyterLab
        • Exploring and Querying Datasets
      • Stata in DXJupyterLab
      • Running Older Versions of DXJupyterLab
      • DXJupyterLab Reference
    • Using Spark
      • Apollo Apps
      • Connect to Thrift
      • Example Applications
        • CSV Loader
        • SQL Runner
        • VCF Loader
      • VCF Preprocessing
    • Environment Variables
    • Objects
      • Describing Data Objects
      • Searching Data Objects
      • Visualizing Data
      • Filtering Objects and Jobs
      • Archiving Files
      • Relational Database Clusters
      • Symlinks
      • Uploading and Downloading Files
        • Small File Sets
          • dx upload
          • dx download
        • Batch
          • Upload Agent
          • Download Agent
    • Platform IDs
    • Organization Member Guide
    • Index of dx commands
  • Developer
    • Developing Portable Pipelines
      • dxCompiler
    • Cloud Workstation
    • Apps
      • Introduction to Building Apps
      • App Build Process
      • Advanced Applet Tutorial
      • Bash Apps
      • Python Apps
      • Spark Apps
        • Table Exporter
        • DX Spark Submit Utility
      • HTTPS Apps
        • Isolated Browsing for HTTPS Apps
      • Transitioning from Applets to Apps
      • Third Party and Community Apps
        • Community App Guidelines
        • Third Party App Style Guide
        • Third Party App Publishing Checklist
      • App Metadata
      • App Permissions
      • App Execution Environment
        • Connecting to Jobs
      • Dependency Management
        • Asset Build Process
        • Docker Images
        • Python package installation in Ubuntu 24.04 AEE
      • Job Identity Tokens for Access to Clouds and Third-Party Services
      • Enabling Web Application Users to Log In with DNAnexus Credentials
      • Types of Errors
    • Workflows
      • Importing Workflows
      • Introduction to Building Workflows
      • Building and Running Workflows
      • Workflow Build Process
      • Versioning and Publishing Global Workflows
      • Workflow Metadata
    • Ingesting Data
      • Molecular Expression Assay Loader
        • Common Errors
        • Example Usage
        • Example Input
      • Data Model Loader
        • Data Ingestion Key Steps
        • Ingestion Data Types
        • Data Files Used by the Data Model Loader
        • Troubleshooting
      • Dataset Extender
        • Using Dataset Extender
    • Dataset Management
      • Rebase Cohorts and Dashboards
      • Assay Dataset Merger
      • Clinical Dataset Merger
    • Apollo Datasets
      • Dataset Versions
      • Cohorts
    • Creating Custom Viewers
    • Client Libraries
      • Support for Python 3
    • Walkthroughs
      • Creating a Mixed Phenotypic Assay Dataset
      • Guide for Ingesting a Simple Four Table Dataset
    • DNAnexus API
      • Entity IDs
      • Protocols
      • Authentication
      • Regions
      • Nonces
      • Users
      • Organizations
      • OIDC Clients
      • Data Containers
        • Folders and Deletion
        • Cloning
        • Project API Methods
        • Project Permissions and Sharing
      • Data Object Lifecycle
        • Types
        • Object Details
        • Visibility
      • Data Object Metadata
        • Name
        • Properties
        • Tags
      • Data Object Classes
        • Records
        • Files
        • Databases
        • Drives
        • DBClusters
      • Running Analyses
        • I/O and Run Specifications
        • Instance Types
        • Job Input and Output
        • Applets and Entry Points
        • Apps
        • Workflows and Analyses
        • Global Workflows
        • Containers for Execution
      • Search
      • System Methods
      • Directory of API Methods
      • DNAnexus Service Limits
  • Administrator
    • Billing
    • Org Management
    • Single Sign-On
    • Audit Trail
    • Integrating with External Services
    • Portal Setup
    • GxP
      • Controlled Tool Access (allowed executables)
  • Science Corner
    • Scientific Guides
      • Somatic Small Variant and CNV Discovery Workflow Walkthrough
      • SAIGE GWAS Walkthrough
      • LocusZoom DNAnexus App
      • Human Reference Genomes
    • Using Hail to Analyze Genomic Data
    • Open-Source Tools by DNAnexus Scientists
    • Using IGV Locally with DNAnexus
  • Downloads
  • FAQs
    • EOL Documentation
      • Python 3 Support and Python 2 End of Life (EOL)
    • Automating Analysis Workflow
    • Backups of Customer Data
    • Developing Apps and Applets
    • Importing Data
    • Platform Uptime
    • Legal and Compliance
    • Sharing and Collaboration
    • Product Version Numbering
  • Release Notes
  • Technical Support
  • Legal
Powered by GitBook

Copyright 2025 DNAnexus

On this page
  • What's the difference between an app and an applet?
  • So should I be building an applet or an app?
  • How do I package a Linux executable into an app(let)?
  • How do I install software requirements for my app (e.g. Java, R, samtools)?
  • How do I run newer NVIDIA GPU-accelerated software in my app?
  • How do I write my app in my favorite programming language?
  • How do I request more memory/CPU for my app? How do I specify the compute instance type?
  • What are the default user limits for processes running inside the Linux execution environment?
  • How do I request network access for my app?
  • How do I parallelize my app?
  • What are DNAnexus links, and how are they different from using the data object IDs?
  • JSON details
  • Job input/output
  • What are job-based object references (JBORs), and how can I use them when running apps?
  • How do I view the stdout/stderr and any other logs for a job?
  • Where can I find example code for writing applications for the platform?

Was this helpful?

Export as PDF
  1. FAQs

Developing Apps and Applets

Last updated 2 months ago

Was this helpful?

What's the difference between an app and an applet?

Applets and apps are both executables in the platform that you can run in the cloud. You can mix and match them in the same workflow. They can also both be specified to require special permissions to the project context or to the user's other projects. Here's how they're different:

Applets

Apps

Model

can be used as scripts for manipulating data, creating proprietary analysis pipelines, or testing versions before publishing an app; they are easy to create and revise

are general-purpose tools of interest to the community at large that usually strive for compatibility, reproducibility, and robustness

Platform Representation

are data objects residing in projects

are created from applets and reside in separate data containers outside of users' projects; there is one container per version of an app

Input/Output Specifications

can be created with no input or output specifications; this means that it can be run with any input, and it makes no guarantees about what it returns

must have input and output specifications so that they behave predictably and can be used compatibly with other apps

Sharing

are shared by sharing the project in which they live, and are otherwise completely private

are shared by publishing the app to a customizable list of authorized users that can easily discover (via the website or command-line), install, and run it

Open/Closed Source

expose their source code and attached resources to anyone who has VIEW permissions to their project

can hide their source code and attached resources so that only the app's developers can access them

Naming

can be given any name

belong to a global namespace, and the first time someone creates an app with a particular name, it is reserved for that user and any others that they designate as developers

Versioning

each revision has a permanent unique ID, recorded in any job started and any data produced

are published with semantic version numbers (xx.yy.zz), with different versions automatically archived and accessible to platform users

Want to read more? Check out the API documentation for and .

So should I be building an applet or an app?

If you are new to writing applications for the DNAnexus platform, start by building applets. They are easier to iterate quickly and share directly with collaborators in your project. Keep in mind that as long as you write your applet to have both an input and output specification (which we recommend as a good coding practice in general), it can always be built to become an app later.

How do I package a Linux executable into an app(let)?

If you haven't already, you may want to take a look at the tutorial, which walks you through creating an applet that takes in a file and outputs another file.

A brief overview of the steps involved:

  1. Run dx-app-wizard:

    • Pick some app name (e.g. appname)

    • Specify any input files or other parameters needed for the executable

    • Specify any output files or values that the executable generates

    • Pick bash as the language

    • Pick the basic template

  2. Place the executable you want to run into the appname/resources directory.

  3. Add a line in the .sh file in the appname/src directory to run the executable on the file with any parameters received from the input. (Use the lines generated by the app wizard for automatically downloading any file input and uploading file output.)

  4. Run either dx build or dx build --create-app (depending on whether you want to build an applet or an app).

How do I install software requirements for my app (e.g. Java, R, samtools)?

There are a few different options, depending on where the software can be found and if you would like it to be reproducible.

  1. { "runSpec": {
    "execDepends": [
        {"name": "openjdk-6-jre", "version": "6b24-1.11.1-4ubuntu2"},
        {"name": "r-base-core"},
        {"name": "samtools"},

    Note that the APT "Recommends" of the packages are not installed by default. You can simulate this behavior on your local Ubuntu machine with --no-install-recommends option to apt-get install.

  2. If the software resides in a globally accessible location (e.g. a git repository hosted on GitHub), you must also request network access to the server hosting it. The following JSON excerpt shows how to request and automatically build a repository hosted on GitHub. (Alternatively, you can also just request access to a particular host and perform the download and build steps manually as part of your app.)

    { "access": {
    "network": ["github.com"]
    },
    "runSpec": {
    "execDepends": [
        {"name": "dx-toolkit",
         "package_manager": "git",
         "url": "git@github.com:dnanexus/dx-toolkit.git",
         "tag": "master",
         "build_commands": "make install DESTDIR=/ PREFIX=/opt/dnanexus"
        },
  3. If it is software that you already have and would like to upload and install as part of the app, then you should place any necessary files in the resources directory of your app before running dx build. The build tool will compress and package up the contents of that directory as part of your app, and, when it is run, it will be automatically downloaded and extracted into the root directory /. Your first steps in the code of your app should then be to perform any build or installation commands necessary.

NOTE: The following Java 8 packages are not available via traditional apt-get; however, we have manually injected them into our APT repo. As a result, you can install them using method #1: the runSpec field of the app's dxapp.json file.

  • openjdk-8-dbg

  • openjdk-8-demo

  • openjdk-8-doc

  • openjdk-8-jdk

  • openjdk-8-jre

  • openjdk-8-jre-headless

  • openjdk-8-jre-jamvm

  • openjdk-8-jre-zero

  • openjdk-8-source

How do I run newer NVIDIA GPU-accelerated software in my app?

$ cat dxasset.json
{
  "name": "nvidia_forward_compatability_asset",
  "title": "nvidia_forward_compatability_asset",
  "description": "nvidia_forward_compatability_asset",
  "version": "0.0.1",
  "distribution": "Ubuntu",
  "release": "24.04",
  "instanceType": "mem2_ssd1_gpu_x16"
}

# Note that the Makefile lines below the "all:" heading are prefixed with tabs
$ cat Makefile
SHELL=/bin/bash -e -x -o pipefail
all:
	sudo mv /etc/apt/apt.conf.d/99dnanexus /tmp/
	wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
	sudo dpkg -i cuda-keyring_1.1-1_all.deb
	sudo apt-get update
	# example below specifies CUDA version 12.5.
	# nvidia provides forward compatibility packages for other CUDA versions as well
	sudo DEBIAN_FRONTEND=noninteractive apt-get -y install cuda-toolkit-12-5 cuda-compat-12-5
	sudo bash -c "echo /usr/local/cuda/compat > /etc/ld.so.conf.d/nvidia-compat.conf"
	sudo ldconfig
	sudo rm cuda-keyring_1.1-1_all.deb
	sudo mv /tmp/99dnanexus /etc/apt/apt.conf.d/99dnanexus
$ dx build_asset
  ...
* nvidia_forward_compatability_asset (create_asset_focal:main) (done) job-xxxx
  testuser 2024-07-01 18:00:00 (runtime 0:17:07)
  Output: asset_bundle = record-GpPQVk80XzzYxZP5Z0J74f7k
 
# Use the asset_bundle created above in the assetDepends field in your app's dxapp.json
$ cd myapp
$ cat dxapp.json
...
 "runSpec": {
  "regionalOptions": {
    "aws:us-east-1": {
      "systemRequirements": {"*": {"instanceType": "mem2_ssd1_gpu_x16"}},
      "assetDepends": [{"id": "record-GpPQVk80XzzYxZP5Z0J74f7k"}]
    }
  }
...

How do I write my app in my favorite programming language?

The following languages are fully supported via dx-app-wizard templates, client libraries and tools, and sample code:

How do I request more memory/CPU for my app? How do I specify the compute instance type?

By default, a job will run on a virtual machine with these resources:

  • Dual-core x86-64 CPU

  • 7.5 GB of RAM

  • 400 GB scratch file system

The following dxapp.json excerpt shows how to request larger virtual machines for both the main and myEntryPoint entry points of your app; any other entry points not listed in systemRequirements will use the default virtual machine mem2_hdd2_x2.

{ "runSpec": {
    "systemRequirements": {
      "main": {
        "instanceType": "mem2_hdd2_x4"
      },
      "myEntryPoint": {
        "instanceType": "mem3_hdd2_x2"
      }
    },

What are the default user limits for processes running inside the Linux execution environment?

The default limits, as presented by the output of ulimit -a, are the following:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 59461
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 59461
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

How do I request network access for my app?

You will need to modify your dxapp.json file so that the key access.network is a list of allowed domain names. You can use "*" to indicate that you want access to everything. The following excerpt gives access to everything:

{ 
  "access": {
    "network": ["*"]
  }
}

The following, meanwhile, limits access to Github and Google:

{ 
  "access": { 
      network: [ "github.com", "google.com" ] 
  } 
}

How do I parallelize my app?

# Anything outside the function declarations is always run

myfunc() {
        echo $myinput
}

main() {
        # main gets run when you run the app/applet

        # The following line creates a new job running "myfunc" which
        # will receive an input variable $myinput set to "hello world"

        dx-jobutil-new-job myfunc -imyinput='hello world'
}
import dxpy

@dxpy.entry_point("myfunc")
def myfunc(myinput):
    print(myinput)

@dxpy.entry_point("main")
def main():
    # main gets run when you run the app/applet

    # The following line creates a new job running "myfunc" which
    # will receive an input variable myinput set to "hello world"

    dxpy.new_dxjob(fn_input={ "myinput": "hello world" }, fn_name="myfunc")

# The following line will call the appropriate entry point.
dxpy.run()

What are DNAnexus links, and how are they different from using the data object IDs?

DNAnexus links are JSON hashes containing a data object ID (or, optionally, a project ID as well). The following two hashes are both valid DNAnexus links (the second is called an extended DNAnexus link).

{
    "$dnanexus_link": "file-B37v04Q7z11654j7gjj000F8"
}
{
    "$dnanexus_link": {
        "project": "project-B2f3Pz87z115j6K7yb40001V",
        "id": "file-B37v04Q7z11654j7gjj000F8"
    }
}

There are two contexts in which you will want to use DNAnexus links instead of string IDs: 1. JSON details of a data object 2. Job input/output

JSON details

How do they affect cloning (copying between projects)?

Any linked data objects that are also hidden will be copied and moved along with their parent data object.

Job input/output

There are convenient utility functions in some of the client libraries and command-line tools so that you don't usually have to formulate the hash yourself.

How do they affect job execution?

Similarly, if an origin or master job has DNAnexus links in its output hash, then those referenced objects must be closed to be cloned as output into the parent container. If they are open when all jobs in the tree have finished running, then the platform will fail the job. If there are any closing objects, the platform will wait until they finish closing before cloning them as output and marking the job as done.

What are job-based object references (JBORs), and how can I use them when running apps?

Job-based object references (JBOR) are JSON hashes with two pieces of information:

  • a job ID (under the key "job")

  • the name of an output the job is expected to provide (under the key "field")

They can be used in place of an input value to a new job or an output value of an existing job. Once the referenced job finishes successfully, the JBOR will be replaced with the value found in the referenced job's output. If the referenced job fails or does not provide the requested output, then the job waiting for the value will also fail.

JSON Syntax

{
    "job": "job-xxxx",
    "field": "ref_output_field"
}

UI

Inside a single workflow, you can use the output of a job as the input of another job by dragging the output of a job to the input of another.

Command line

$ dx run someapp -iinput=job-xxxx:ref_output_field

How do I view the stdout/stderr and any other logs for a job?

Through Web UI

  1. Navigate to the project in which you ran the job

  2. Click on the Monitor tab

  3. Click on the name of the top-level job

  4. Click on the job for which you want the logs

Through Command-Line

$ dx watch job-xxxx

Where can I find example code for writing applications for the platform?

If you would like to see more example code, you can use the dx get command to reconstruct and download the source directory of open-source apps (e.g. dx get app-cloud_workstation). You can find open-source apps with the command below

$ dx api system findApps '{"describe":{"fields":{"openSource": true, "name": true}}}'| \
  jq '.results|.[]|select(.describe.openSource)|.describe.name'

If the software you need is available as an package in , you can edit the dxapp.json file to specify it (and, optionally, the version you want). The following JSON excerpt shows you how to request APT software packages; in this case, Java, R, and the packages have been requested and will be available when the app is run.

You can build an , which is compatible with software from both APT and GitHub. It can also be compatible with other software. Please see the page for more details.

If you are trying to use newer NVIDIA GPU-accelerated software in your app, you may find that the NVIDIA GPU kernel-mode driver nvidia.ko that is installed outside of the application execution environment does not support the newer CUDA version required by your application. You can request a newer NVIDIA driver using the nvidiaDriver field as described in , or you can install packages to use the newer CUDA version required by your application by creating a DNAnexus asset that includes NVIDIA Forward Compatibility packages, then using the asset in your app as follows:

If you would like to use a language that is not yet supported, you can still do so by packaging any scripts or files with a Python or bash script that will be responsible for running your code. See the tutorial for an example of how to package up an arbitrary Linux executable as an applet. If you need to install any extra software dependencies such as Java or R, see [the above answer](/FAQ#How-do-I-install-software-requirements-for-my-app-(e.g.-Java,-R,-samtools)%3F) for more information.

To request a different machine, you will need to edit the dxapp.json file to specify the instance type for each entry point of your app. Please see the documentation on the for the list of available instance types and other details.

You can have your app launch a subjob on another machine in the cloud by calling an entry point that you have specified in the bash or Python script that runs your app. To add entry points to your code, you can either generate the parallelized code template using dx-app-wizard (provided in the SDK) to get you started, or add them manually (see the code examples below). A more in-depth tutorial can be found .

Through a

Through a ****

Before closing a data object, you have the option to include non-extended DNAnexus links (i.e. ID-only links) in the of the object. This allows you to link to an existing data object (e.g. a reference genome) that was used to compute the new data object (e.g. a reference genome indexed for a particular mapper like BWA). Doing so allows you to search your data objects by what objects they link to.

If an app(let) to be of a data object class (e.g. file), then the value in the input/output JSON hash must be given as a DNAnexus link.

Before a job created from running an app or applet () will be started, data objects found as DNAnexus links in its input hash must be closed. Thus, if you run an app on an open file, it will not run until the file has been closed. The input data objects will then be cloned into the new job's temporary workspace right before the job starts running.

For more information on how to use JBORs when writing apps, see the page.

There are many helpful code snippets to be found on the page, listed by programming language.

APT
Ubuntu
samtools
Asset Bundle
Asset Build Process
Python
bash
Intro to Building Apps
here
Bash App
Python App
details
requires an input or output
Sample Code
Developer Tutorials
apps
Intro to Building Apps
NVIDIA Forward Compatibility
Run Specification
Run Specification
origin or master job
applets