DNAnexus Documentation
APIDownloadsIndex of dx CommandsLegal
  • Overview
  • Getting Started
    • DNAnexus Essentials
    • Key Concepts
      • Projects
      • Organizations
      • Apps and Workflows
    • User Interface Quickstart
    • Command Line Quickstart
    • Developer Quickstart
    • Developer Tutorials
      • Bash
        • Bash Helpers
        • Distributed by Chr (sh)
        • Distributed by Region (sh)
        • SAMtools count
        • TensorBoard Example Web App
        • Git Dependency
        • Mkfifo and dx cat
        • Parallel by Region (sh)
        • Parallel xargs by Chr
        • Precompiled Binary
        • R Shiny Example Web App
      • Python
        • Dash Example Web App
        • Distributed by Region (py)
        • Parallel by Chr (py)
        • Parallel by Region (py)
        • Pysam
      • Web App(let) Tutorials
        • Dash Example Web App
        • TensorBoard Example Web App
      • Concurrent Computing Tutorials
        • Distributed
          • Distributed by Region (sh)
          • Distributed by Chr (sh)
          • Distributed by Region (py)
        • Parallel
          • Parallel by Chr (py)
          • Parallel by Region (py)
          • Parallel by Region (sh)
          • Parallel xargs by Chr
  • User
    • Login and Logout
    • Projects
      • Project Navigation
      • Path Resolution
    • Running Apps and Workflows
      • Running Apps and Applets
      • Running Workflows
      • Running Nextflow Pipelines
      • Running Batch Jobs
      • Monitoring Executions
      • Job Notifications
      • Job Lifecycle
      • Executions and Time Limits
      • Executions and Cost and Spending Limits
      • Smart Reuse (Job Reuse)
      • Apps and Workflows Glossary
      • Tools List
    • Cohort Browser
      • Chart Types
        • Row Chart
        • Histogram
        • Box Plot
        • List View
        • Grouped Box Plot
        • Stacked Row Chart
        • Scatter Plot
        • Kaplan-Meier Survival Curve
      • Locus Details Page
    • Using DXJupyterLab
      • DXJupyterLab Quickstart
      • Running DXJupyterLab
        • FreeSurfer in DXJupyterLab
      • Spark Cluster-Enabled DXJupyterLab
        • Exploring and Querying Datasets
      • Stata in DXJupyterLab
      • Running Older Versions of DXJupyterLab
      • DXJupyterLab Reference
    • Using Spark
      • Apollo Apps
      • Connect to Thrift
      • Example Applications
        • CSV Loader
        • SQL Runner
        • VCF Loader
      • VCF Preprocessing
    • Environment Variables
    • Objects
      • Describing Data Objects
      • Searching Data Objects
      • Visualizing Data
      • Filtering Objects and Jobs
      • Archiving Files
      • Relational Database Clusters
      • Symlinks
      • Uploading and Downloading Files
        • Small File Sets
          • dx upload
          • dx download
        • Batch
          • Upload Agent
          • Download Agent
    • Platform IDs
    • Organization Member Guide
    • Index of dx commands
  • Developer
    • Developing Portable Pipelines
      • dxCompiler
    • Cloud Workstation
    • Apps
      • Introduction to Building Apps
      • App Build Process
      • Advanced Applet Tutorial
      • Bash Apps
      • Python Apps
      • Spark Apps
        • Table Exporter
        • DX Spark Submit Utility
      • HTTPS Apps
        • Isolated Browsing for HTTPS Apps
      • Transitioning from Applets to Apps
      • Third Party and Community Apps
        • Community App Guidelines
        • Third Party App Style Guide
        • Third Party App Publishing Checklist
      • App Metadata
      • App Permissions
      • App Execution Environment
        • Connecting to Jobs
      • Dependency Management
        • Asset Build Process
        • Docker Images
        • Python package installation in Ubuntu 24.04 AEE
      • Job Identity Tokens for Access to Clouds and Third-Party Services
      • Enabling Web Application Users to Log In with DNAnexus Credentials
      • Types of Errors
    • Workflows
      • Importing Workflows
      • Introduction to Building Workflows
      • Building and Running Workflows
      • Workflow Build Process
      • Versioning and Publishing Global Workflows
      • Workflow Metadata
    • Ingesting Data
      • Molecular Expression Assay Loader
        • Common Errors
        • Example Usage
        • Example Input
      • Data Model Loader
        • Data Ingestion Key Steps
        • Ingestion Data Types
        • Data Files Used by the Data Model Loader
        • Troubleshooting
      • Dataset Extender
        • Using Dataset Extender
    • Dataset Management
      • Rebase Cohorts and Dashboards
      • Assay Dataset Merger
      • Clinical Dataset Merger
    • Apollo Datasets
      • Dataset Versions
      • Cohorts
    • Creating Custom Viewers
    • Client Libraries
      • Support for Python 3
    • Walkthroughs
      • Creating a Mixed Phenotypic Assay Dataset
      • Guide for Ingesting a Simple Four Table Dataset
    • DNAnexus API
      • Entity IDs
      • Protocols
      • Authentication
      • Regions
      • Nonces
      • Users
      • Organizations
      • OIDC Clients
      • Data Containers
        • Folders and Deletion
        • Cloning
        • Project API Methods
        • Project Permissions and Sharing
      • Data Object Lifecycle
        • Types
        • Object Details
        • Visibility
      • Data Object Metadata
        • Name
        • Properties
        • Tags
      • Data Object Classes
        • Records
        • Files
        • Databases
        • Drives
        • DBClusters
      • Running Analyses
        • I/O and Run Specifications
        • Instance Types
        • Job Input and Output
        • Applets and Entry Points
        • Apps
        • Workflows and Analyses
        • Global Workflows
        • Containers for Execution
      • Search
      • System Methods
      • Directory of API Methods
      • DNAnexus Service Limits
  • Administrator
    • Billing
    • Org Management
    • Single Sign-On
    • Audit Trail
    • Integrating with External Services
    • Portal Setup
    • GxP
      • Controlled Tool Access (allowed executables)
  • Science Corner
    • Scientific Guides
      • Somatic Small Variant and CNV Discovery Workflow Walkthrough
      • SAIGE GWAS Walkthrough
      • LocusZoom DNAnexus App
      • Human Reference Genomes
    • Using Hail to Analyze Genomic Data
    • Open-Source Tools by DNAnexus Scientists
    • Using IGV Locally with DNAnexus
  • Downloads
  • FAQs
    • EOL Documentation
      • Python 3 Support and Python 2 End of Life (EOL)
    • Automating Analysis Workflow
    • Backups of Customer Data
    • Developing Apps and Applets
    • Importing Data
    • Platform Uptime
    • Legal and Compliance
    • Sharing and Collaboration
    • Product Version Numbering
  • Release Notes
  • Technical Support
  • Legal
Powered by GitBook

Copyright 2025 DNAnexus

On this page
  • Key Concepts
  • Jobs
  • Running a Job
  • Additional Information
  • Environment Variables in the Container
  • Job I/O and Error Reporting
  • Special Files in the Initial Working Directory
  • Monitoring Jobs
  • Debugging and Connecting to Jobs via SSH
  • Code Interpreters
  • python3
  • Bash
  • Available Resources
  • Computational Power and Memory
  • Choosing an Application Execution Environment
  • Network Access
  • Software Packages
  • Using Application Resource Containers
  • Logging Service
  • Using the Python Logger Facility

Was this helpful?

Export as PDF
  1. Developer
  2. Apps

App Execution Environment

Last updated 2 months ago

Was this helpful?

The Execution Environment is the system on which your app executes when running on the DNAnexus Platform. Currently, the Platform supports and . The App API lets you specify the amount of computational resources your app will need (the instance type that it will be launched on) and the software packages that it requires as dependencies.

DNAnexus is working to phase out outdated terminology and change scripts using those terms to remove inappropriate language. The terms "master" and "slave" will be replaced with "driver" and "clusterWorker" in Spark documentation. DNAnexus will also eventually replace the older terms in the codebase. For now, variable names and scripts containing the older terms will still be used in the actual code.

Key Concepts

Jobs

When you send an /app-id/run, /applet-id/run, or /job/new call to the DNAnexus API, a job object is generated, then dispatched to a worker node when all of its inputs are ready and it is considered "runnable."

Running a Job

The worker node performs the following:

  • Generates a virtualized Linux container (virtual machine) just for your job. The container is a full-featured Linux OS.

    • If your job runs a sub-job using /job/new, the sub-job gets a completely independent virtual machine. Therefore, each individual job is free to make use of all the resources of its instance.

  • Reads the runSpec.execDepends field of your app and installs packages in the container.

  • Configures networking and logging services in the container.

  • Fetches the code given in the runSpec.code field of the app and saves it in the working directory inside an interpreter-specific execution template.

  • If any file objects are found in runSpec.bundledDepends, they are also downloaded to the root directory / and unpacked if compressed (with a mechanism that supports at least tar, gz, and other popular formats).

  • execute the bootstrap script (if provided) on all nodes. At this point, clusterWorker nodes should be fully initialized. They do not perform the following step of executing the job code.

  • Executes the code with the interpreter given in runSpec.interpreter.

  • Waits for the code to complete, reports the output or any errors to the platform, and destroys the virtual machine.

Additional Information

The rest of this document describes the details of what happens in the virtual machine, what is expected of your executable in order to successfully report your output and any errors, and how you can request or provide additional resources for your job.

Environment Variables in the Container

The following environment variables are set in the container by the system before running your code. Their values communicate the information necessary to access the API and your job's data. Please note that the variables below are automatically consumed by the language bindings supplied by DNAnexus, so there is often no action necessary to use them.

Variable

Meaning

Example value

DX_SECURITY_CONTEXT

Contains the authentication information needed to access the API.

{"auth_token":"outside","auth_token_type":"Bearer"}

DX_APISERVER_HOST

Contains the API server hostname.

10.0.3.1

DX_APISERVER_PORT

Contains the API server port.

443

DX_JOB_ID

Contains the ID of the job executing in the Execution Environment.

job-xxxx

DX_WORKSPACE_ID

Contains the ID of the job workspace container object on the platform.

container-xxxx

DX_RESOURCES_ID

Contains the ID of the app global resources container object on the platform.

container-xxxx

DX_PROJECT_CACHE_ID

Contains the ID of the app project cache container object on the platform.

container-xxxx

DX_PROJECT_CONTEXT_ID

Contains the ID of the project context (project billed for this computation, and where the outputs of the origin job are going to appear).

project-xxxx

Variables only present when the job is running on a cluster:

Variable

Meaning

Example value

DX_CLUSTER_MASTER_IP

The private IP address of the cluster driver node (only present on clusterWorker nodes)

172.168.1.120

DX_CLUSTER_LOCAL_IP

The private IP address of this cluster node

10.1.1.101

DX_CLUSTER_HOSTNAME

FQDN for this cluster node

ec2_instance_public.hostname

DX_CLUSTER_NODE_ID

A unique integer ID for each node in the cluster, node 0 is always the driver node

1

Variables only present when the job is running on a cluster of type dxspark:

Variable

Meaning

Example value

DX_CLUSTER_METASTORE_URI

URI for accessing the hive metastore

hivemetastore.uri

Job I/O and Error Reporting

Job input, output, and error data are passed through JSON files in the working directory where your code runs.

Special Files in the Initial Working Directory

File

Contents

Generated by

Example value

job_input.json

Serialized JSON contents of job.input for the currently running job

System

{"x": 1, "y": true}

job_output.json

Serialized JSON contents of the output of the currently running job (to be saved into the job object by the system)

Bindings or client code

{"z": {"$dnanexus_link": "file-xxxx"}}

job_error.json

Serialized JSON contents of an error encountered by the currently running job, if any

Bindings or client code

{"error": {"type": "AppError", "message": "x must be at least 2"}}

job_input.json

Before executing your code, the system saves the job input in the file job_input.json in the working directory where your code will run. You can either read this file directly, or rely on the execution template and language-specific bindings code (if available) to read it in and provide the input for you. For example, the Python language bindings will read in the job input and pass it as keyword arguments directly to your entry point function.

job_output.json

When your code has finished running, it must return to the system the values it wants to save in the output field of the job object representing the current job. This is done by serializing these values in the file job_output.json in the original working directory. You can either do this yourself, or rely on the execution template and language-specific bindings code (if available) to save the output for you. For example, the Python language bindings will expect your entry point function to return a hash with the output values, and serialize that.

NOTE: An empty hash ({}) must be saved to job_output.json even if your applet does not have an output spec.

job_error.json

If your code encounters a fatal error condition, it must exit with a non-zero exit status (raising an error or throwing an exception will make this happen in most languages). To facilitate debugging, it is also recommended that the job provide extended information about the error. Depending on the interpreter, throwing an exception may be sufficient to report an error message, or you may have to write to the file job_error.json file directly. The system will inspect the contents of this exception or file and set the failure metadata for the job object accordingly.

The file should be formatted like so:

{"error": {"type": "AppInternalError", "message": "Error while running micromap"}}

Error Types

The field error.type in the file job_error.json should be set to one of the recognized error types.

Recognized Type

Description

Example

AppError

Recognized actionable error. Use this to request corrective action by the user to change application input. The error message is exposed in the UI.

{"error": {"type": "AppError", "message": "Out of memory: Please select a larger instance type for your job"}}

AppInternalError

Unexpected application error. Use this to indicate an error which requires debugging. The error message is not exposed in the UI.

{"error": {"type": "AppInternalError", "message": "Division by zero at line 256"}}

Monitoring Jobs

The stdout and stderr of every running job are automatically captured and logged for you, and you can access these logs through the API as the job is running or after it has finished.

Debugging and Connecting to Jobs via SSH

Jobs can be optionally configured to allow SSH connections from a specified range of IPs, and to hold the execution environment for debugging when certain types of errors happen (debug hold).

Code Interpreters

Apps and applets can be interpreted by "bash", "python3" interpreters.

python3

The Python 3 interpreter makes it easy to write apps in Python.

Entry Points

To designate entry points in your Python script, simply decorate the functions with @dxpy.entry_point("entry_point_name"). The following code snippet demonstrates when each part of your script will be run.

import dxpy

@dxpy.entry_point("myfunc")
def myfunc():
    # Gets run when you make a /job/new API call with "function" set to "myfunc"
    pass

@dxpy.entry_point("main")
def main():
    # Gets run when you make an /app(let)-xxxx/run API call OR
    # a /job/new API call with "function" set to "main"
    pass

# The following line will call the appropriate entry point.
dxpy.run()

Job Input

While the job's input will always be provided in the file job_input.json, the Python interpreter will also provide the key-value pairs as keyword arguments when calling your entry points.

Exception Handling

If your app throws dxpy.AppError, then the interpreter will report the job failure with failure reason AppError. In general, this error should be used for errors resulting from invalid user input. If your app throws an exception of any other class, the job will report the failure as AppInternalError.

Bash

This is the general-purpose interpreter which you can use to run whatever shell commands and/or executables you may have packaged together with your app or applet.

Entry Points

To create multiple entry points for your bash executable, simply create bash functions with the same name as your entry point. The following code snippet demonstrates when each part of your script will be run.

# Anything outside the function declarations is always run

myfunc() {
    # Gets run when you make a /job/new API call with "function" set to "myfunc"
}

main() {
    # Gets run when you make an /app(let)-xxxx/run API call OR
    # a /job/new API call with "function" set to "main"
}

Job Input

While the job's input will always be provided in the file job_input.json, the bash interpreter will also set an environment variable for each key in the job input with value equal to the key's value. Case is preserved.

Exception Handling

Your bash script is interpreted with the -e flag set, so if any command exits with a nonzero exit code, your app will fail at that point with failure reason AppInternalError. To report an error with a more helpful error message, you must first write to the file job_error.json before letting a command exit with a nonzero exit code.

Available Resources

Computational Power and Memory

Region

Default Instance Type Size

aws:us-east-1

mem2_hdd2_x2

aws:ap-southeast-2

mem2_hdd2_x2

aws:eu-central-1

mem1_ssd1_x4

azure:westus

azure:mem2_ssd1_x1

azure:westeurope

azure:mem2_ssd1_x1

If you need more computational resources for your app, you can request a different machine instance type in the runSpec.systemRequirements.instanceType field of your dxapp.json.

Some of the resources on a worker will be shared with DNAnexus Platform processes that help run your job.

Choosing an Application Execution Environment

To specify the Application Execution Environment, please specify runSpec.distribution, runSpec.release and runSpec.version fields in your dxapp.json using the values in the table below:

distribution, release, version

supported interpreters

"Ubuntu", "24.04", "0"

python3, bash

"Ubuntu", "20.04", "0"

python3, bash

Network Access

Networking is pre-configured in the execution environment. Network access is restricted by default and must be requested explicitly using the access.network field of your dxapp.json file, or /applet/new or /app/new. For example, use {"access": {"network": ["*"]}} to request unrestricted access.

When network access is restricted, the following are disabled:

  • Outgoing communication

  • DNS resolution (except for domains for services that remain available, as listed below)

The following remain available when network access is restricted:

  • Access to the DNAnexus API server

  • Access to DNAnexus project data

  • The ability to install Ubuntu packages from both official Ubuntu and DNAnexus repositories

  • The ability to SSH into the job

  • Communication between cluster nodes

  • Thrift

  • The DNAnexus Platform Metastore

  • The Platform Vizserver

  • Snowflake

Software Packages

DNAnexus Utilities

External Utilities

If your program relies on packages that must be present in the system in order to run, you can specify them in the dxapp.json (or directly in the Run Specification input to /app/new) like so:

    { "runSpec": {
        "execDepends": [
            {"name": "samtools"},
            {"name": "bedtools", "version": "2.16.1-1", "stages": ["work"]},
            {"name": "dx-toolkit",
             "package_manager": "git",
             "url": "https://github.com/dnanexus/dx-toolkit.git",
             "tag": "master",
             "destdir": "/opt/dx-toolkit",
             "build_commands": "make install DESTDIR=/ PREFIX=/opt/dnanexus"},
            {"name": "pysam",
             "package_manager": "pip",
             "version": "0.7.4"},
            {"name": "Bio::SeqIO",
             "package_manager": "cpan",
             "version": "1.0b3"},
            {"name": "bio",
             "package_manager": "gem",
             "version": "1.4.3"},
             {"name": "plyr",
             "package_manager": "cran",
             "version": "1.8.1"},
            {"name": "ggplot2",
             "package_manager": "cran"}
        ]
        ...
      },
      ...
    }

Package manager

Language

Main repository URL

pip

Python

cpan

Perl

gem

Ruby

cran

R

NOTE: the requested APT packages will be installed but their "Recommends" will not be installed. You can simulate this behavior with apt-get install --no-install-recommends PACKAGES ... on an Ubuntu system.

NOTE:: To access any repository other than APT, your app or applet must request network access to the repository's host by adding an entry like "access": {"network": ["*"]}} to its dxapp.json metadata.

External APT Repositories

Loading APT packages in your execDepends only works for packages that are part of the default Ubuntu repositories. If you want to install a package from a third-party repository, you'll have to do so yourself at the beginning of your app code. In addition to configuring APT on the system to use the desired repository, you'll need to bypass the Execution Environment's built-in APT caching proxy and ensure your app has sufficient network permissions.

Git-Specific Arguments

The following arguments are supported in execution dependencies where package_manager is set to git:

  • url string (required): The URL pointing to the git repository.

  • tag string (optional): The tag to check out from the

    repository. Defaults to the default tag of the remote.

  • destdir string (optional): The directory to check the

    repository out into. It will be created if not present. Defaults to

    a temporary directory created by mktemp.

  • build_commands string (optional): Arbitrary shell commands to

    run upon completing the checkout, for example "configure && make &&

    make install".

  • stages array of strings (optional): Same meaning as in other

    dependency specifications.

Packaging Your Own Resources

Dependencies that are not available readily as packages can be bundled with an executable as data objects linked in the bundledDepends field of the executable's run specification. These can be any type of data objects including files and applets, and they will be made present in the temporary workspace of any job running the executable. Furthermore, any files found in this list of bundled dependencies will automatically be downloaded (and potentially unpacked) in the execution environment to the root directory.

If you are building your executable via the DNAnexus build utility, the tool will archive and upload any local files found in the resources directory in your source tree and extend the bundledDepends list to include this new file object on DNAnexus platform. When your executable is run, a file that you had placed in MyApp/resources/usr/bin/analyze-dna will be available as /usr/bin/analyze-dna in the execution environment. Note: the resources/ subdirectory is unpacked into the root of the virtual filesystem, not the working directory where your executable is started.

Using Application Resource Containers

Application resource containers are platform objects that enclose static or temporary data belonging to the application. Containers behave like projects. There are three types of containers created automatically for apps (only the temporary workspace is available when running applets):

Type

Description

Environment variable

Apps

Applets

Created whenever an app or applet is run; used for inputs/outputs

workspace

DX_WORKSPACE_ID

✓

✓

Container in which data can be cached for future execution by the same version of an app; it is always associated with a particular project

projectCache

DX_PROJECT_CACHE_ID

✓

✗

Created during app creation, containing any resources the app requires for execution

resources

DX_RESOURCES_ID

✓

✗

Logging Service

Messages printed by processes running in the execution environment to their standard output and standard error streams are saved to the job log. The job log has a 4 MB size limit, past which messages will be truncated. Job logs prior to the release of the 4 MB size limit (implemented June 20, 2023) will have a limit of 2 MB. Job logs can be monitored in real time through the web interface or on the command line using dx watch.

In addition to logging standard output and standard error, jobs can produce custom log level messages. The valid log levels are:

Source ID

Level

Appears as

DX_APP_STREAM (default)

info (default)

STDOUT

DX_APP_STREAM (default)

error

STDERR

------------------------

----------------

-----------

DX_APP

debug

DEBUG

DX_APP

info (default)

INFO

DX_APP

warning

WARNING

DX_APP

error

ERROR

DX_APP

critical

CRITICAL

Using the Python Logger Facility

When running Python programs, you can plug the Python logger facility directly into the DNAnexus logging system described above. To do so, use the following code:

import dxpy, logging
logger = logging.getLogger(__name__)
logger.addHandler(dxpy.DXLogHandler())
logger.propagate = False
logger.setLevel(logging.DEBUG)

The logger object can then be used to log messages at or above the log level specified, e.g. logger.debug("message").

For more information, see .

Default machine sizes vary by region. Below is the default mapping per region. For more precise specifications see

Access to

The ability to HTTPS into the job, if it is an job

The contents of the are available in the container, and environment variables such as PATH, PYTHONPATH, PERL5LIB, etc. are automatically set before your app runs, so that you can run utilities from the SDK simply as dx etc., as well as import the bindings libraries in scripting languages.

Here, the first dependency is an. The second dependency is also an APT package, but specifies a particular version and limits the entry points (referred to as stages in this context) to install the dependency for to just the "work" entry point. (by default, dependencies are installed for all entry points). The third dependency instructs the system to fetch directly from a Git repository, and the rest are dependencies for language-specific package managers:

See the app in the dx-toolkit distribution, which shows all the steps in action and demonstrates installing a package from an external APT repository. (; )

field

For applications written in Python, methods in the module provide convenience functions for accessing these workspaces.

See the help for the dx-log-stream command (dx-log-stream --help) and the file in the DNAnexus SDK for more details.

Connecting to jobs via SSH
Instance Types
DBClusters
httpsApp
dx-toolkit
dxda
dxfuse
DNAnexus toolkit
APT package
external_apt_repo_example
App code
dxapp.json
dxpy.bindings.dxapp_container_functions
dxlog.py
The Python Package Index (PyPI)
Comprehensive Perl Archive Network (CPAN)
RubyGems
Comprehensive R Archive Network (CRAN)
Ubuntu Linux
24.04
20.04
Temporary workspace
Cluster jobs
Project cache container
Resources container
/job-xxxx/describe