Python Apps
Learn to build a simple Python app on the DNAnexus Platform.
This tutorial will demonstrate:
Writing, deploying, running, and monitoring apps in Python
Using the DNAnexus Platform APIs to represent and store your data
This tutorial assumes that you have already installed the DNAnexus SDK and worked through Intro to Building Apps. Refer back to that tutorial as necessary.
Before You Begin
To initialize the SDK environment, open your command line terminal, navigate to the directory where you extracted the SDK (for example, /home/Bart/Downloads/dx-toolkit
), and type:
This will place the DNAnexus client scripts in your executable PATH
, and the Python DNAnexus libraries in the Python library path.
Next, type:
to log onto the Platform and select a project to work in.
Source code for the example apps used in this tutorial can be found in the doc/examples/dx-apps
directory of the SDK. You can also browse the example programs on Github.
Revisiting the Quality Trimmer
We'll start by recreating the quality trimmer app from Intro to Building Apps in a more idiomatic Python app.
Run the command-line DNAnexus App Wizard (dx-app-wizard
). The App Wizard can assist you in creating Python apps, so we'll use it here:
Generated Files
Open up the generated metadata file, dxapp.json
. The run specification specifies what code your app is to run and how it should be invoked. In this case the runSpec.file
field refers to a file src/python_trimmer_example.py
. The specified file is executed whenever you run your app.
This file was automatically generated by dx-app-wizard
; you can see that it includes a skeleton that handles retrieving your input files from the platform to the local filesystem, and uploading the output files after your analysis has run.
Under the line that says "Fill in your application code here", add the following line to do your analysis:
Also import the subprocess
module (just add import subprocess
underneath the other imports at the top of the file).
Your python_trimmer_example.py
file will now look like the following:
The app inputs are listed as keyword arguments to the main
entry point function, which is executed when you run the app. The return value of this function should be a hash that contains the names and values of your app's output parameters.
Inputs that are DNAnexus data objects
are represented as dicts containing
DNAnexus links. These can be passed as
inputs to a handler class to construct a handler object (such as
dxpy.DXFile(input_name)
above, or withdxpy.get_handler()
), or reduced to the string containing theobject ID:
input_name['$dnanexus_link']
.Inputs of primitive classes (int, float, string, boolean, or hash) are given directly as the corresponding
Python data types.
Outputs that are data objects should be given as DNAnexus links, which can be constructed from handler
objects or ID strings using
Outputs of primitive classes should be given using their Python data types.
To complete your app, download the FASTX-Toolkit (courtesy of the Hannon Lab), extract it, and put the fastq_quality_trimmer
executable into the resources/usr/bin
subdirectory of your app directory. Also, download the sample reads file we've provided and upload it to a project, if you haven't already:
Building and Running an App on the Platform
Next, upload your app to the DNAnexus Platform. In the app's directory, run:
When loading your app the second and subsequent times, also pass the --overwrite
or -f
flag to request the removal of old versions of your app.
Now we'll run the app on the Platform, instantiating a new job. When your job has successfully been enqueued, dx run
prints out a job ID you can use to track the progress of your job.
During or after the execution of your job, you can check its status with dx describe JOB_ID
. This command will show the outputs of the job once the job has finished (if successful).
Congratulations! You've run your first app on the DNAnexus Platform.
Many common bioinformatics pipelines can be represented by steps that each have the pattern illustrated above (which is generally the easiest way to take a preexisting analysis and make it run as a DNAnexus app or applet):
Download inputs from the Platform using the API bindings and save them to local files in your execution container.
Shell out to a subprocess to run whatever analysis you like, producing local files as output.
Upload outputs from the local files you've produced back into the Platform, again using the API bindings.
Last updated