Introduction to Building Apps

Learn to build a custom applet and run it on the DNAnexus Platform. Optionally, convert your applet to an app so it can be run by other users, in their own projects.

Applets and Apps

Applets and apps are types of executables that can be run on the DNAnexus Platform. They differ in several ways, notably in the context in which each can be used:

  • Applets are data objects, which live inside Platform projects.

  • Apps do not live inside projects, and can be published to allow other users to run them in projects of their choosing.

Applets and apps are created in the same way, up until the final build step. At this step, the developer specifies whether the executable in question should be an applet or an app. An applet can also be converted to an app later, by following these instructions.

For more on the difference between applets and apps, see this detailed comparison of their respective features.

Overview

In this tutorial, you’ll learn to create an applet based on an existing executable: FastQTrimmer, one of the FASTX-Toolkit collection of command-line tools for processing short-reads FASTA and FASTQ files. You’ll then use the applet to run FastQTrimmer on a FASTQ file, creating a trimmed reads file that you can then use for further analysis.

Figure 1 shows how you could run FastQTrimmer on your local machine, to process a sequence file in a project on the Platform. As you’ll note, you would need to 1) use dx download to download the source file to your local machine, then 2) process it using the fastq_quality_trimmer executable (i.e. FastQTrimmer), 3) use dx upload to upload the new trimmed reads file to 4) a project on the Platform.

By turning FastQTrimmer into an applet, you make this process much simpler and quicker. You don’t have to download or upload anything, and you can take advantage of the power of the Platform, in running FastQTrimmer.

As shown in Figure 2, you’ll use two DNAnexus dx utilities in the course of creating your applet: 1) dx-app-wizard creates a skeleton directory for the applet, while dx build 2) adds the applet to the Platform as 3) a data object in your project.

Before You Begin

Before beginning this tutorial, download and install dx-toolkit. If you haven’t already done so, you may also want to run through the Command Line Quickstart. You should also make sure that you’re logged into the DNAnexus Platform, ideally using an API token, to prevent your being logged off before you’ve finished building your applet.

Step 1. Initial Downloads

Begin by downloading both:

Step 2. Run the App Wizard

Next, you need to create a local directory and a source code template for your applet. While you can do this manually, the App Wizard enables you to do so via a guided workflow, in a few easy steps. Following is a walkthrough of this workflow, along with detail on how to respond to prompts from the Wizard:

  1. Launch the App Wizard from the CLI by entering the command dx-app-wizard

  2. Enter “mytrimmer” as the name of your applet.

  3. Optionally, enter a title - this is the name of your applet, as displayed in the product UI.

  4. Optionally, enter a summary - a short description of what your applet does.

  5. Enter a version number for your applet, or press <Enter> to accept the default value of “0.0.1.”

  6. Enter a name - such as "input_file" - for your applet’s 1st input parameter.

  7. Optionally, enter a human-readable label for the 1st input parameter.

  8. Select “file” from the list of input parameter class types displayed.

  9. Enter “n” to indicate that this parameter is not optional.

  10. Rather than entering details on a 2nd input parameter, press <Enter> to finish entering input parameter details.

  11. Enter a name - such as "output_file" - for the output parameter your applet will produce.

  12. Optionally, enter a human-readable label for the output file.

  13. Select “file” from the list of output parameter class types displayed.

  14. At the prompt, rather than entering details on a 2nd output parameter, press <Enter> to finish entering output parameter details.

  15. Set a timeout policy value. This is the maximum amount of time your applet is allowed to run before timing out. Press <Enter> if you want to accept the default value of 48 hours.

  16. Set “bash” as the programming language for your applet.

  17. For each of the remaining questions about template options, access permissions, and Instance types, press <Enter> to accept the defaults.

The App Wizard will finish by creating a local directory called mytrimmer.

Here’s how this will all look, from the CLI:

$ dx-app-wizard
DNAnexus App Wizard, API v1.0.0
[...]

The name of your app must be unique on the DNAnexus platform. After creating your app for the
 first time, you will be able to publish new versions using the same app name. App names are
restricted to alphanumeric characters (a-z, A-Z, 0-9), and the characters ".", "\_", and "-".

App Name: mytrimmer

[...] (Press <ENTER> to accept defaults)

Input Specification

You will now be prompted for each input parameter to your app. Each parameter should have a
 unique name that uses only the underscore "\_" and alphanumeric characters, and does not
 start with a number.

1st input name (<ENTER> to finish): input_file
Label (optional human-readable name) []: <ENTER>
Your input variable must be of one of the following classes:
applet         array:file     array:record   file           int     
array:applet   array:float    array:string   float          record         
array:boolean  array:int      boolean        hash           string      

Choose a class (<TAB> twice for choices): file
This is an optional parameter [y/n]: n

2nd input name (<ENTER> to finish): <ENTER>

Output Specification

You will now be prompted for each output parameter of your app. Each parameter should have
a unique name that uses only the underscore "\_" and alphanumeric characters, and does not
start with a number.

1st output name (<ENTER> to finish): output_file
Label (optional) []: <ENTER>
Your output parameter must be of one of the following classes:
applet         array:file     array:record   file           int
array:applet   array:float    array:string   float          record
array:boolean  array:int      boolean        hash           string
Choose a class (<TAB> twice for choices): file

2nd output name (<ENTER> to finish): <ENTER>

Timeout Policy

Set a timeout policy for your app. Any single entry point of the app that runs longer than
the specified timeout will fail with a TimeoutExceeded error. Enter an int greater than 0
with a single-letter suffix (m=minutes, h=hours, d=days) (e.g. "48h").

Timeout policy [48h]: <ENTER>

Template Options

You can write your app in any programming language, but we provide templates for the
following supported languages: Python, bash
Programming language: (Enter either Python or bash)

Access Permissions
If you request these extra permissions for your app, users will see this fact when launching
your app, and certain other restrictions will apply. For more information, see
https://documentation.dnanexus.com/developer/apps/app-permissions.

Access to the Internet (other than accessing the DNAnexus API).
Will this app need access to the Internet? [y/N]: (Enter y for Internet connection, or n for no Internet connection)

Direct access to the parent project. This is not needed if your app specifies outputs,which will be copied into the project after it's done running.
Will this app need access to the parent project? [y/N]: (Enter y for direct access, or n for no access)

System Requirements

Common AWS instance types:
┌─────────────┬─────────┬──────────┬─────────┐
│Name         │Memory_GB│Storage_GB│CPU_Cores│
├─────────────┼─────────┼──────────┼─────────┤
│mem1_ssd1_x2 │3.8      │32        │2        │
│mem1_ssd1_x4 │7.5      │80        │4        │
│mem1_ssd1_x8 │15.0     │160       │8        │
│mem1_ssd1_x16│30.0     │320       │16       │
│mem1_ssd1_x32│60.0     │640       │32       │
│mem2_ssd1_x2 │7.5      │32        │2        │
│mem2_ssd1_x4 │15.0     │80        │4        │
│mem2_ssd1_x8 │30.0     │160       │8        │
│mem3_ssd1_x2 │15.0     │32        │2        │
│mem3_ssd1_x4 │30.5     │80        │4        │
│mem3_ssd1_x8 │61.0     │160       │8        │
│mem3_ssd1_x16│122.0    │320       │16       │
│mem3_ssd1_x32│244.0    │640       │32       │
│mem1_ssd2_x2 │3.8      │160       │2        │
│mem1_ssd2_x4 │7.5      │320       │4        │
│mem1_ssd2_x8 │15       │640       │8        │
│mem1_ssd2_x16│30       │1280      │16       │
│mem1_ssd2_x36│60       │2880      │36       │
└─────────────┴─────────┴──────────┴─────────┘
Common Azure instance types:
┌───────────────────┬─────────┬──────────┬─────────┐
│Name               │Memory_GB│Storage_GB│CPU_Cores│
├───────────────────┼─────────┼──────────┼─────────┤
│azure:mem1_ssd1_x2 │3.9      │32        │2        │
│azure:mem1_ssd1_x4 │7.8      │64        │4        │
│azure:mem1_ssd1_x8 │15.7     │128       │8        │
│azure:mem1_ssd1_x16│31.4     │256       │16       │
│azure:mem2_ssd1_x1 │3.5      │128       │1        │
│azure:mem2_ssd1_x2 │7.0      │128       │2        │
│azure:mem2_ssd1_x4 │14.0     │128       │4        │
│azure:mem2_ssd1_x8 │28.0     │256       │8        │
│azure:mem2_ssd1_x16│56.0     │512       │16       │
│azure:mem3_ssd1_x2 │14.0     │128       │2        │
│azure:mem3_ssd1_x4 │28.0     │128       │4        │
│azure:mem3_ssd1_x8 │56.0     │256       │8        │
│azure:mem3_ssd1_x16│112.0    │512       │16       │
│azure:mem3_ssd1_x20│140.0    │640       │20       │
│azure:mem4_ssd1_x2 │28.0     │128       │2        │
│azure:mem4_ssd1_x4 │56.0     │128       │4        │
│azure:mem4_ssd1_x8 │112.0    │256       │8        │
│azure:mem4_ssd1_x16│224      │512       │16       │
│azure:mem4_ssd1_x32│448      │1024      │32       │
└───────────────────┴─────────┴──────────┴─────────┘
Default instance type: The instance type you select here will apply to all entry points
in your app unless you override it. See
https://documentation.dnanexus.com/developer/api/running-analyses/instance-types for more
information.
Choose an instance type for your app [mem1_ssd1_x4]: (Enter the default instance type you
wish to use)

*** Generating DNAnexus App Template... ***

[...]

Created files:
        mytrimmer/Readme.developer.md
        mytrimmer/Readme.md
        mytrimmer/dxapp.json
        mytrimmer/resources/
        mytrimmer/src/
        mytrimmer/src/mytrimmer.sh
        mytrimmer/test/

App directory created!

Running the DNAnexus build utility will create an executable on the DNAnexus platform.
Any files found in the resources directory will be uploaded so that they will be present
in the root directory when the executable is run.

Step 3: Add the Executable

The DNAnexus Platform runs applets on a Linux VM with a stock Ubuntu 20.04 environment. When run, your applet will in turn run an executable - the fastq_quality_trimmer file you downloaded in Step 1. This executable is not available on the VM by default. To make it available, enter the following commands, which will create a directory on the VM, then copy the fastq_quality_trimmer file into that directory from your local machine:

$ mkdir -p mytrimmer/resources/usr/bin/

$ cp /path/to/fastq_quality_trimmer mytrimmer/resources/usr/bin/

Note that in the second command, you’ll need to provide the path to the fastq_quality_trimmer file on your local machine, substituting this for /path/to/.

Once the fastq_quality_trimmer file is in the directory mytrimmer/resources/usr/bin/, it can be accessed by dx build, which you’ll use to build your applet, as detailed in Step 5 below. dx build will then package the executable, along with any other files stored in the mytrimmer/resources directory, as part of your applet.

Step 4. Write the Code

In the main mytrimmer directory, you’ll see a file named dxapp.json. Open dxapp.json in a text editor. You’ll see that the runSpec block contains specs for both the interpreter to be used, and the name of the program to be run:

"interpreter": "bash", "file": "src/mytrimmer.sh"

Close the file. Navigate to the src directory and open the mytrimmer.sh file in a text editor. You’ll see that in the main() block, some of the code has been filled in for you.

Edit the code in the main() block to incorporate the line that will run your executable. See the code line beginning with fastq_quality_trimmer -t 20 in the code block below. Note that some of the boilerplate comments have been omitted for brevity’s sake.

#!/bin/bash

main() {

    # When the applet is run, the variable "input_file" is already set
    # to the DNAnexus link to the file object. Here, we download it to the
    # job's scratch space

    dx download "$input_file" -o input_file

    # Insert the following line between the download and upload lines

    fastq_quality_trimmer -t 20 -Q 33 -i input_file -o output_file

    # Here, we set the variable "output_file" to be the ID of the
    # uploaded file.

    output_file=$(dx upload output_file --brief)

    # This line reports the uploaded file ID under the output field
    # called "output_file".

    dx-jobutil-add-output output_file "$output_file" --class=file
}

Step 5: Build the Applet

Next you’ll build the applet using dx build.

Select the project in which you want to use the applet:

  1. Enter the command dx select

  2. Enter the number corresponding to the project in which you want to use the applet

Now make sure you’re in the directory inside of which you created the mytrimmer directory. From that directory, enter the command:

$ dx build mytrimmer

Note that you can run dx build from within the mytrimmer directory if you prefer, If you do so, omit the directory name from the command:

$ dx build

Once dx build completes, you’ll see a confirmation message displaying the unique id assigned by the Platform to your new applet. It will look like this:

{"id": "applet-G7GFz9805XQPKQj14ZqX9Vq3"}

Your applet will now appear as a data object in your project. To see it, enter the command:

$ dx ls

To get more info on your applet, enter the command:

$ dx describe mytrimmer

You'll see a description that looks like the following, with the fastq_quality_trimmer executable shown using its Platform ID, in the bundledDepends section:

$ dx describe mytrimmer
Result 1:
...
Name            mytrimmer
...
Input Spec      input_file (file)
Output Spec     output_file (file)
Interpreter     bash
bundledDepends  resources.tar.gz (file-B42KQ3pqqBkGJz8B3J900049)
...

Step 6. Upload the Sample Input File

Before you run your applet using the sample input file you downloaded in Step 1, you must upload that file to the Platform.

Navigate to the local directory to which you downloaded the small-celegans-sample.fastq file. Upload it to the Platform using the command:

$ dx upload small-celegans-sample.fastq

The file will appear in your project, as you’ll see by entering the command:

$ dx ls

Step 7. Run the Applet

You are now ready to launch the analysis in the cloud, using 4) the dx run command. When you launch the analysis, the Platform will bring up 5) a new Linux VM to run your code.

Launch the Applet

Now launch the applet by entering the command:

$ dx run mytrimmer -iinput_file=small-celegans-sample.fastq

You’ll see a prompt asking you to confirm that you want to run the job with the input you designated. Enter “Y.”

You’ll see a confirmation that includes a Job ID, and a prompt asking if you want to watch, or monitor, your job’s progress:

Calling applet-G7GFz9805XQPKQj14ZqX9Vq3 with output destination project-G7FbxV805XQ0k10vKbG474p9:/

Job ID: job-G7GG1f005XQ350gFB9VY0Kb

Watch launched job now? [Y/n]

Monitor the Job

Enter “Y” if you’d like to monitor your job. You’ll see a log file giving detail on every step of the job's progress.

Access the Job's Output

When the job has finished, enter the command dx ls to view the files in your project. This list will now include the output file generated by your applet.

Enter the command dx get to retrieve the output file:

$ dx get output_file

To see the first ten lines of the output file, enter the command:

$ head output_file

This excerpt of the file should look something like this, and thus should show that your applet worked correctly:

@SRR070372.1 FV5358E02GLGSF length=78 TTTTTTTTTTTTTTTTTTTTTTTTTTTNTTTNTTTNTTTNTTTATTTATTTATTTATTATTATATATATATATATA +SRR070372.1 FV5358E02GLGSF length=78 ...000//////999999<<<=<<666!602!777!922!688:669A9=<=122569AAA?>@BBBBAA?=<966 @SRR070372.2 FV5358E02FQJUJ length=177 TTTCTTGTAATTTGTTGGAATACGAGAACATCGTCAATAATATATCGTATGAATTGAACCACACGGCACATATTTGAACTTGTTCGTGAAATTTAGCGAACCTGGCAGGACTCGAACCTCCAATCTTCGGATCCGAAGTCCGACGCCCCCGCGTCGGATGCGTTGTTACCACTGCTT +SRR070372.2 FV5358E02FQJUJ length=177 222@99912088>C<?7779@<GIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIC;6666IIIIIIIIIIII;;;HHIIE>944=>=;22499;CIIIIIIIIIIIIHHHIIIIIIIIIIIIIIIH?;;;?IIEEEEEEEEIIII77777I7EEIIEEHHHHHIIIIIIIIIIIIII @SRR070372.3 FV5358E02GYL4S length=70 TTGGTATCATTGATATTCATTCTGGAGAACGATGGAACATACAAGAATTGTGTTAAGACCTGCATAA£

You can also run a program like seqmagick to verify that the sequences have been trimmed.

Behind the Scenes

Figure 4 gives an overview of how your applet is run. Once the Platform has instantiated a Linux VM, it runs your applet, executing the shell script commands you provided. The script runs just as it would on your local computer, 6) downloading the reads to the hard drive of the virtual machine, 7) running FASTX-Toolkit, then 8) uploading the resulting file to 9) your project.

Advanced Options

Convert Your Applet to an App

As noted above, you can convert your applet to an app, to enable others to use it in their own projects. Follow these directions to convert it to an app.

Advanced Applet Options

If you wish to change the inputs or outputs of your applet, or request additional execution resources - adding network access or more CPU or memory, for example - edit the file mytrimmer/dxapp.json and re-run dx build. See the Advanced App Tutorial for a detailed overview of the dxapp.json file, and how to edit it.

Other App Wizard Templates

When running dx-app-wizard, you selected the "basic" execution template. This means that your applet will run on a single machine. You can use the wizard’s --template option to set more advanced execution options:

  • basic: Your applet or app will run on a single machine.

  • parallelized: Your applet or app will subdivide a large chunk of work into multiple pieces that can be processed in parallel and independently of each other, followed by a final stage that will merge and process the results as necessary.

  • scatter-process-gather: Similar to parallelized but with the addition of a "scatter" entry point. This allows you to break out the execution for splitting up the input, or you can call a separate applet or app to perform the splitting.

Try the other available templates to see simple examples of how to parallelize your execution over multiple machines in the cloud, by using additional entry points. You can also use other programming languages, leveraging DNAnexus client libraries. While the dx client provides a wide range of advanced functionality, client libraries can provide a richer experience for programmatically accessing and modifying data on the Platform, in the programming language of your choice.

Learn More

See the Advanced App Tutorial to get a better understanding of the app directory structure and how to manually modify app inputs, outputs, and metadata.

See the Job Lifecycle page for detail on the progression of a job's states and discusses the reasons a job may fail.

Last updated