DNAnexus Documentation
APIDownloadsIndex of dx CommandsLegal
  • Overview
  • Getting Started
    • DNAnexus Essentials
    • Key Concepts
      • Projects
      • Organizations
      • Apps and Workflows
    • User Interface Quickstart
    • Command Line Quickstart
    • Developer Quickstart
    • Developer Tutorials
      • Bash
        • Bash Helpers
        • Distributed by Chr (sh)
        • Distributed by Region (sh)
        • SAMtools count
        • TensorBoard Example Web App
        • Git Dependency
        • Mkfifo and dx cat
        • Parallel by Region (sh)
        • Parallel xargs by Chr
        • Precompiled Binary
        • R Shiny Example Web App
      • Python
        • Dash Example Web App
        • Distributed by Region (py)
        • Parallel by Chr (py)
        • Parallel by Region (py)
        • Pysam
      • Web App(let) Tutorials
        • Dash Example Web App
        • TensorBoard Example Web App
      • Concurrent Computing Tutorials
        • Distributed
          • Distributed by Region (sh)
          • Distributed by Chr (sh)
          • Distributed by Region (py)
        • Parallel
          • Parallel by Chr (py)
          • Parallel by Region (py)
          • Parallel by Region (sh)
          • Parallel xargs by Chr
  • User
    • Login and Logout
    • Projects
      • Project Navigation
      • Path Resolution
    • Running Apps and Workflows
      • Running Apps and Applets
      • Running Workflows
      • Running Nextflow Pipelines
      • Running Batch Jobs
      • Monitoring Executions
      • Job Notifications
      • Job Lifecycle
      • Executions and Time Limits
      • Executions and Cost and Spending Limits
      • Smart Reuse (Job Reuse)
      • Apps and Workflows Glossary
      • Tools List
    • Cohort Browser
      • Chart Types
        • Row Chart
        • Histogram
        • Box Plot
        • List View
        • Grouped Box Plot
        • Stacked Row Chart
        • Scatter Plot
        • Kaplan-Meier Survival Curve
      • Locus Details Page
    • Using DXJupyterLab
      • DXJupyterLab Quickstart
      • Running DXJupyterLab
        • FreeSurfer in DXJupyterLab
      • Spark Cluster-Enabled DXJupyterLab
        • Exploring and Querying Datasets
      • Stata in DXJupyterLab
      • Running Older Versions of DXJupyterLab
      • DXJupyterLab Reference
    • Using Spark
      • Apollo Apps
      • Connect to Thrift
      • Example Applications
        • CSV Loader
        • SQL Runner
        • VCF Loader
      • VCF Preprocessing
    • Environment Variables
    • Objects
      • Describing Data Objects
      • Searching Data Objects
      • Visualizing Data
      • Filtering Objects and Jobs
      • Archiving Files
      • Relational Database Clusters
      • Symlinks
      • Uploading and Downloading Files
        • Small File Sets
          • dx upload
          • dx download
        • Batch
          • Upload Agent
          • Download Agent
    • Platform IDs
    • Organization Member Guide
    • Index of dx commands
  • Developer
    • Developing Portable Pipelines
      • dxCompiler
    • Cloud Workstation
    • Apps
      • Introduction to Building Apps
      • App Build Process
      • Advanced Applet Tutorial
      • Bash Apps
      • Python Apps
      • Spark Apps
        • Table Exporter
        • DX Spark Submit Utility
      • HTTPS Apps
        • Isolated Browsing for HTTPS Apps
      • Transitioning from Applets to Apps
      • Third Party and Community Apps
        • Community App Guidelines
        • Third Party App Style Guide
        • Third Party App Publishing Checklist
      • App Metadata
      • App Permissions
      • App Execution Environment
        • Connecting to Jobs
      • Dependency Management
        • Asset Build Process
        • Docker Images
        • Python package installation in Ubuntu 24.04 AEE
      • Job Identity Tokens for Access to Clouds and Third-Party Services
      • Enabling Web Application Users to Log In with DNAnexus Credentials
      • Types of Errors
    • Workflows
      • Importing Workflows
      • Introduction to Building Workflows
      • Building and Running Workflows
      • Workflow Build Process
      • Versioning and Publishing Global Workflows
      • Workflow Metadata
    • Ingesting Data
      • Molecular Expression Assay Loader
        • Common Errors
        • Example Usage
        • Example Input
      • Data Model Loader
        • Data Ingestion Key Steps
        • Ingestion Data Types
        • Data Files Used by the Data Model Loader
        • Troubleshooting
      • Dataset Extender
        • Using Dataset Extender
    • Dataset Management
      • Rebase Cohorts and Dashboards
      • Assay Dataset Merger
      • Clinical Dataset Merger
    • Apollo Datasets
      • Dataset Versions
      • Cohorts
    • Creating Custom Viewers
    • Client Libraries
      • Support for Python 3
    • Walkthroughs
      • Creating a Mixed Phenotypic Assay Dataset
      • Guide for Ingesting a Simple Four Table Dataset
    • DNAnexus API
      • Entity IDs
      • Protocols
      • Authentication
      • Regions
      • Nonces
      • Users
      • Organizations
      • OIDC Clients
      • Data Containers
        • Folders and Deletion
        • Cloning
        • Project API Methods
        • Project Permissions and Sharing
      • Data Object Lifecycle
        • Types
        • Object Details
        • Visibility
      • Data Object Metadata
        • Name
        • Properties
        • Tags
      • Data Object Classes
        • Records
        • Files
        • Databases
        • Drives
        • DBClusters
      • Running Analyses
        • I/O and Run Specifications
        • Instance Types
        • Job Input and Output
        • Applets and Entry Points
        • Apps
        • Workflows and Analyses
        • Global Workflows
        • Containers for Execution
      • Search
      • System Methods
      • Directory of API Methods
      • DNAnexus Service Limits
  • Administrator
    • Billing
    • Org Management
    • Single Sign-On
    • Audit Trail
    • Integrating with External Services
    • Portal Setup
    • GxP
      • Controlled Tool Access (allowed executables)
  • Science Corner
    • Scientific Guides
      • Somatic Small Variant and CNV Discovery Workflow Walkthrough
      • SAIGE GWAS Walkthrough
      • LocusZoom DNAnexus App
      • Human Reference Genomes
    • Using Hail to Analyze Genomic Data
    • Open-Source Tools by DNAnexus Scientists
    • Using IGV Locally with DNAnexus
  • Downloads
  • FAQs
    • EOL Documentation
      • Python 3 Support and Python 2 End of Life (EOL)
    • Automating Analysis Workflow
    • Backups of Customer Data
    • Developing Apps and Applets
    • Importing Data
    • Platform Uptime
    • Legal and Compliance
    • Sharing and Collaboration
    • Product Version Numbering
  • Release Notes
  • Technical Support
  • Legal
Powered by GitBook

Copyright 2025 DNAnexus

On this page
  • Before You Begin
  • Revisiting the Quality Trimmer
  • Generated Files
  • Building and Running an App on the Platform

Was this helpful?

Export as PDF
  1. Developer
  2. Apps

Python Apps

Learn to build a simple Python app on the DNAnexus Platform.

Last updated 2 years ago

Was this helpful?

This tutorial will demonstrate:

  • Writing, deploying, running, and monitoring apps in Python

  • Using the DNAnexus Platform APIs to represent and store your data

This tutorial assumes that you have already installed the and worked through . Refer back to that tutorial as necessary.

Before You Begin

To initialize the SDK environment, open your command line terminal, navigate to the directory where you extracted the SDK (for example, /home/Bart/Downloads/dx-toolkit), and type:

$ source environment

This will place the DNAnexus client scripts in your executable PATH, and the Python DNAnexus libraries in the Python library path.

Next, type:

$ dx login

to log onto the Platform and select a project to work in.

Source code for the example apps used in this tutorial can be found in the doc/examples/dx-apps directory of the SDK. You can also browse the example programs .

Revisiting the Quality Trimmer

We'll start by recreating the quality trimmer app from in a more idiomatic Python app.

Run the command-line DNAnexus App Wizard (dx-app-wizard). The App Wizard can assist you in creating Python apps, so we'll use it here:

$ dx-app-wizard
⋮
App Name: python_trimmer_example
⋮ (<ENTER> to accept defaults)

Input Specification

You will now be prompted for each input parameter to your app.  Each parameter s\hould have a unique name that uses only the underscore (`_`) and alphanumeric characters, and does not start with a number.

1st input name (<ENTER> to finish): input_name
Label (optional human-readable name) []: Input file
Your input parameter must be of one of the following classes:
applet         array:file     array:record   file           int
array:applet   array:float    array:string   float          record
array:boolean  array:int      boolean        hash           string

Choose a class (<TAB> twice for choices): file
This is an optional parameter [y/n]: n

2nd input name (<ENTER> to finish): <ENTER>

Output Specification

You will now be prompted for each output parameter of your app.  Each parameter should have a unique
name that uses only the underscore (`_`) and alphanumeric characters, and does not start with a
number.

1st output name (<ENTER> to finish): output_file
Label (optional human-readable name) []: Output file
Choose a class (<TAB> twice for choices): file

2nd output name (<ENTER> to finish): <ENTER>

Template Options

You can write your app in any programming language, but we provide templates for the
following supported languages: Python, bash
Programming language [Python]: <ENTER>
⋮
Execution pattern [basic]: <ENTER>
⋮

Generated Files

Open up the generated metadata file, dxapp.json. The run specification specifies what code your app is to run and how it should be invoked. In this case the runSpec.file field refers to a file src/python_trimmer_example.py. The specified file is executed whenever you run your app.

This file was automatically generated by dx-app-wizard; you can see that it includes a skeleton that handles retrieving your input files from the platform to the local filesystem, and uploading the output files after your analysis has run.

Under the line that says "Fill in your application code here", add the following line to do your analysis:

subprocess.check_call("fastq_quality_trimmer -t 20 -Q 33 -i input_name -o output_file", shell=True)

Also import the subprocess module (just add import subprocess underneath the other imports at the top of the file).

Your python_trimmer_example.py file will now look like the following:

#!/usr/bin/env python3
# python_trimmer_example 1.0.0
#
# Some comments have been abbreviated here; create an app using dx-app-wizard
# or look in dx-toolkit/doc/examples/dx-apps/python_trimmer_example to read
# the comments in full.

import os
import dxpy
import subprocess

@dxpy.entry_point('main')
def main(input_name):

    # Create DXDataObject handlers for the input object(s).

    input_name = dxpy.DXFile(input_name)

    # Download the file to the local filesystem.

    dxpy.download_dxfile(input_name.get_id(), "input_name")

    # Fill in your application code here.

    subprocess.check_call("fastq_quality_trimmer -t 20 -Q 33 -i input_name -o output_file", shell=True)

    # Upload the output file (presumed to now exist at the path output_file)
    # back to the platform.

    output_file = dxpy.upload_local_file("output_file");

    # Returns a reference to the file object we just created.

    output = {}
    output["output_file"] = dxpy.dxlink(output_file)

    return output

dxpy.run()

The app inputs are listed as keyword arguments to the main entry point function, which is executed when you run the app. The return value of this function should be a hash that contains the names and values of your app's output parameters.

  • inputs to a handler class to construct a handler object (such as dxpy.DXFile(input_name) above, or with

    object ID: input_name['$dnanexus_link'].

  • Inputs of primitive classes (int, float, string, boolean, or hash) are given directly as the corresponding

    Python data types.

  • Outputs that are data objects should be given as DNAnexus links, which can be constructed from handler

    objects or ID strings using

    Outputs of primitive classes should be given using their Python data types.

$ dx upload small-celegans-sample.fastq

Building and Running an App on the Platform

Next, upload your app to the DNAnexus Platform. In the app's directory, run:

$ dx build -a .

When loading your app the second and subsequent times, also pass the --overwrite or -f flag to request the removal of old versions of your app.

Now we'll run the app on the Platform, instantiating a new job. When your job has successfully been enqueued, dx run prints out a job ID you can use to track the progress of your job.

$ dx run python_trimmer_example -iinput_name=small-celegans-sample.fastq
            # Inspect the input parameters and press ENTER to confirm...
⋮
Calling applet-xxxx with output destination project-yyyy:/

Job ID: job-zzzz

During or after the execution of your job, you can check its status with dx describe JOB_ID. This command will show the outputs of the job once the job has finished (if successful).

$ dx describe job-zzzz
Result 1:
ID              job-zzzz
⋮
State           running
⋮
Input           input_file = project-xxxx:file-yyyy
Output          -

Congratulations! You've run your first app on the DNAnexus Platform.

Many common bioinformatics pipelines can be represented by steps that each have the pattern illustrated above (which is generally the easiest way to take a preexisting analysis and make it run as a DNAnexus app or applet):

  • Download inputs from the Platform using the API bindings and save them to local files in your execution container.

  • Shell out to a subprocess to run whatever analysis you like, producing local files as output.

  • Upload outputs from the local files you've produced back into the Platform, again using the API bindings.

Inputs that are DNAnexus

are represented as containing

. These can be passed as

), or reduced to the string containing the

.

To complete your app, download the (courtesy of the Hannon Lab), extract it, and put the fastq_quality_trimmer executable into the resources/usr/bin subdirectory of your app directory. Also, download the we've provided and upload it to a project, if you haven't already:

DNAnexus SDK
Intro to Building Apps
on Github
Intro to Building Apps
data objects
dicts
dxpy.get_handler()
dxpy.dxlink()
FASTX-Toolkit
sample reads file
DNAnexus links