Python Apps

Learn to build a simple Python app on the DNAnexus Platform.

This tutorial will demonstrate:

  • Writing, deploying, running, and monitoring apps in Python

  • Using the DNAnexus Platform APIs to represent and store your data

This tutorial assumes that you have already installed the DNAnexus SDK and worked through Intro to Building Apps. Refer back to that tutorial as necessary.

Before You Begin

To initialize the SDK environment, open your command line terminal, navigate to the directory where you extracted the SDK (for example, /home/Bart/Downloads/dx-toolkit), and type:

$ source environment

This will place the DNAnexus client scripts in your executable PATH, and the Python DNAnexus libraries in the Python library path.

Next, type:

$ dx login

to log onto the Platform and select a project to work in.

Source code for the example apps used in this tutorial can be found in the doc/examples/dx-apps directory of the SDK. You can also browse the example programs on Github.

Revisiting the Quality Trimmer

We'll start by recreating the quality trimmer app from Intro to Building Apps in a more idiomatic Python app.

Run the command-line DNAnexus App Wizard (dx-app-wizard). The App Wizard can assist you in creating Python apps, so we'll use it here:

$ dx-app-wizard

App Name: python_trimmer_example
⋮ (<ENTER> to accept defaults)

Input Specification

You will now be prompted for each input parameter to your app.  Each parameter s\hould have a unique name that uses only the underscore (`_`) and alphanumeric characters, and does not start with a number.

1st input name (<ENTER> to finish): input_name
Label (optional human-readable name) []: Input file
Your input parameter must be of one of the following classes:
applet         array:file     array:record   file           int
array:applet   array:float    array:string   float          record
array:boolean  array:int      boolean        hash           string

Choose a class (<TAB> twice for choices): file
This is an optional parameter [y/n]: n

2nd input name (<ENTER> to finish): <ENTER>

Output Specification

You will now be prompted for each output parameter of your app.  Each parameter should have a unique
name that uses only the underscore (`_`) and alphanumeric characters, and does not start with a
number.

1st output name (<ENTER> to finish): output_file
Label (optional human-readable name) []: Output file
Choose a class (<TAB> twice for choices): file

2nd output name (<ENTER> to finish): <ENTER>

Template Options

You can write your app in any programming language, but we provide templates for the
following supported languages: Python, bash
Programming language [Python]: <ENTER>

Execution pattern [basic]: <ENTER>

Generated Files

Open up the generated metadata file, dxapp.json. The run specification specifies what code your app is to run and how it should be invoked. In this case the runSpec.file field refers to a file src/python_trimmer_example.py. The specified file is executed whenever you run your app.

This file was automatically generated by dx-app-wizard; you can see that it includes a skeleton that handles retrieving your input files from the platform to the local filesystem, and uploading the output files after your analysis has run.

Under the line that says "Fill in your application code here", add the following line to do your analysis:

subprocess.check_call("fastq_quality_trimmer -t 20 -Q 33 -i input_name -o output_file", shell=True)

Also import the subprocess module (just add import subprocess underneath the other imports at the top of the file).

Your python_trimmer_example.py file will now look like the following:

#!/usr/bin/env python3
# python_trimmer_example 1.0.0
#
# Some comments have been abbreviated here; create an app using dx-app-wizard
# or look in dx-toolkit/doc/examples/dx-apps/python_trimmer_example to read
# the comments in full.

import os
import dxpy
import subprocess

@dxpy.entry_point('main')
def main(input_name):

    # Create DXDataObject handlers for the input object(s).

    input_name = dxpy.DXFile(input_name)

    # Download the file to the local filesystem.

    dxpy.download_dxfile(input_name.get_id(), "input_name")

    # Fill in your application code here.

    subprocess.check_call("fastq_quality_trimmer -t 20 -Q 33 -i input_name -o output_file", shell=True)

    # Upload the output file (presumed to now exist at the path output_file)
    # back to the platform.

    output_file = dxpy.upload_local_file("output_file");

    # Returns a reference to the file object we just created.

    output = {}
    output["output_file"] = dxpy.dxlink(output_file)

    return output

dxpy.run()

The app inputs are listed as keyword arguments to the main entry point function, which is executed when you run the app. The return value of this function should be a hash that contains the names and values of your app's output parameters.

  • Inputs that are DNAnexus data objects

    are represented as dicts containing

    DNAnexus links. These can be passed as

    inputs to a handler class to construct a handler object (such as dxpy.DXFile(input_name) above, or with

    dxpy.get_handler()), or reduced to the string containing the

    object ID: input_name['$dnanexus_link'].

  • Inputs of primitive classes (int, float, string, boolean, or hash) are given directly as the corresponding

    Python data types.

  • Outputs that are data objects should be given as DNAnexus links, which can be constructed from handler

    objects or ID strings using

    dxpy.dxlink().

    Outputs of primitive classes should be given using their Python data types.

To complete your app, download the FASTX-Toolkit (courtesy of the Hannon Lab), extract it, and put the fastq_quality_trimmer executable into the resources/usr/bin subdirectory of your app directory. Also, download the sample reads file we've provided and upload it to a project, if you haven't already:

$ dx upload small-celegans-sample.fastq

Building and Running an App on the Platform

Next, upload your app to the DNAnexus Platform. In the app's directory, run:

$ dx build -a .

When loading your app the second and subsequent times, also pass the --overwrite or -f flag to request the removal of old versions of your app.

Now we'll run the app on the Platform, instantiating a new job. When your job has successfully been enqueued, dx run prints out a job ID you can use to track the progress of your job.

$ dx run python_trimmer_example -iinput_name=small-celegans-sample.fastq
            # Inspect the input parameters and press ENTER to confirm...

Calling applet-xxxx with output destination project-yyyy:/

Job ID: job-zzzz

During or after the execution of your job, you can check its status with dx describe JOB_ID. This command will show the outputs of the job once the job has finished (if successful).

$ dx describe job-zzzz
Result 1:
ID              job-zzzz

State           running

Input           input_file = project-xxxx:file-yyyy
Output          -

Congratulations! You've run your first app on the DNAnexus Platform.

Many common bioinformatics pipelines can be represented by steps that each have the pattern illustrated above (which is generally the easiest way to take a preexisting analysis and make it run as a DNAnexus app or applet):

  • Download inputs from the Platform using the API bindings and save them to local files in your execution container.

  • Shell out to a subprocess to run whatever analysis you like, producing local files as output.

  • Upload outputs from the local files you've produced back into the Platform, again using the API bindings.

Last updated