App Execution Environment
The Execution Environment is the system on which your app executes when running on the DNAnexus Platform. Currently, the Platform supports Ubuntu Linux 24.04 and 20.04. The App API lets you specify the amount of computational resources your app will need (the instance type that it will be launched on) and the software packages that it requires as dependencies.
Key Concepts
Jobs
When you send an /app-id/run
, /applet-id/run
, or /job/new
call to the DNAnexus API, a job object is generated, then dispatched to a worker node when its inputs are ready and it is considered "runnable."
Running a Job
The worker node performs the following:
Generates a virtualized Linux container (virtual machine) just for your job. The container is a full-featured Linux OS.
If your job runs a sub-job using
/job/new
, the sub-job gets an independent virtual machine. Therefore, each individual job is free to use all the resources of its instance.
Reads the
runSpec.execDepends
field of your app and installs packages in the container.Configures networking and logging services in the container.
Fetches the code given in the
runSpec.code
field of the app and saves it in the working directory inside an interpreter-specific execution template.If any file objects are found in
runSpec.bundledDepends
, they are downloaded to the root directory/
and unpacked if compressed (with a mechanism that supports at least TAR, GZ, and other popular formats).Cluster jobs execute the bootstrap script (if provided) on all nodes. At this point,
clusterWorker
nodes should be fully initialized. They do not perform the following step of executing the job code.Executes the code with the interpreter given in
runSpec.interpreter
.Waits for the code to complete, reports the output or any errors to the platform, and destroys the virtual machine.
The rest of the information on this page describes the details of what happens in the virtual machine, what is expected of your executable to successfully report your output and any errors, and how you can request or provide additional resources for your job.
Environment Variables in the Container
The following environment variables are set in the container by the system before running your code. Their values communicate the information necessary to access the API and your job's data.
The variables are automatically consumed by the language bindings provided by DNAnexus, so there is often no action necessary to use them. When accessing the variables from within a WDL script or other custom implementations, you'll need to load the environment variables in the shell using source /home/dnanexus/environment
.
DX_APISERVER_HOST
Contains the API server hostname.
10.0.3.1
DX_APISERVER_PORT
Contains the API server port.
443
DX_GWF_RESOURCES_ID
Contains the ID of the most immediate global workflow parent container object on the platform.
container-xxxx
DX_JOB_ID
Contains the ID of the job executing in the Execution Environment.
job-xxxx
DX_WORKSPACE_ID
Contains the ID of the job workspace container object on the platform.
container-xxxx
DX_RESOURCES_ID
Contains the ID of the app global resources container object on the platform.
container-xxxx
DX_PROJECT_CACHE_ID
Contains the ID of the app project cache container object on the platform.
container-xxxx
DX_PROJECT_CONTEXT_ID
Contains the ID of the project context (project billed for this computation, and where the outputs of the origin job are going to appear).
project-xxxx
DX_SECURITY_CONTEXT
Contains the authentication information needed to access the API.
{"auth_token":"outside","auth_token_type":"Bearer"}
Variables only present when the job is running on a cluster:
DX_CLUSTER_MASTER_IP
The private IP address of the cluster driver node (only present on clusterWorker
nodes)
172.168.1.120
DX_CLUSTER_LOCAL_IP
The private IP address of this cluster node
10.1.1.101
DX_CLUSTER_HOSTNAME
FQDN for this cluster node
ec2_instance_public.hostname
DX_CLUSTER_NODE_ID
A unique integer ID for each node in the cluster, node 0 is always the driver node
1
Variables only present when the job is running on a cluster of type dxspark
:
DX_CLUSTER_METASTORE_URI
URI for accessing the hive metastore
hivemetastore.uri
Job I/O and Error Reporting
Job input, output, and error data are passed through JSON files in the working directory where your code runs.
Special Files in the Initial Working Directory
job_input.json
Serialized JSON contents of job.input for the currently running job
System
{"x": 1, "y": true}
job_output.json
Serialized JSON contents of the output of the currently running job (to be saved into the job object by the system)
Bindings or client code
{"z": {"$dnanexus_link": "file-xxxx"}}
job_error.json
Serialized JSON contents of an error encountered by the currently running job, if any
Bindings or client code
{"error": {"type": "AppError", "message": "x must be at least 2"}}
job_input.json
job_input.json
Before executing your code, the system saves the job input in the file job_input.json
in the working directory where your code will run. You can either read this file directly, or rely on the execution template and language-specific bindings code (if available) to read it in and provide the input for you. For example, the Python language bindings will read in the job input and pass it as keyword arguments directly to your entry point function.
job_output.json
job_output.json
When your code has finished running, it must return to the system the values it wants to save in the output
field of the job object representing the current job. This is done by serializing these values in the file job_output.json
in the original working directory. You can either do this yourself, or rely on the execution template and language-specific bindings code (if available) to save the output for you. For example, the Python language bindings will expect your entry point function to return a hash with the output values, and serialize that.
NOTE: An empty hash ({}
) must be saved to job_output.json
even if your applet does not have an output spec.
job_error.json
job_error.json
If your code encounters a fatal error condition, it must exit with a non-zero exit status (raising an error or throwing an exception will make this happen in most languages). To help debugging, it is also recommended that the job provide extended information about the error. Depending on the interpreter, throwing an exception may be sufficient to report an error message, or you may have to write to the file job_error.json
file directly. The system will inspect the contents of this exception or file and set the failure metadata for the job object accordingly.
The file should be formatted like so:
{"error": {"type": "AppInternalError", "message": "Error while running micromap"}}
Error Types
The field error.type
in the file job_error.json
should be set to one of the recognized error types.
AppError
Recognized actionable error. Use this to request corrective action by the user to change application input. The error message is exposed in the UI.
{"error": {"type": "AppError", "message": "Out of memory: Please select a larger instance type for your job"}}
AppInternalError
Unexpected application error. Use this to indicate an error which requires debugging. The error message is not exposed in the UI.
{"error": {"type": "AppInternalError", "message": "Division by zero at line 256"}}
Monitoring Jobs
The stdout and stderr of every running job are automatically captured and logged for you, and you can access these logs through the API as the job is running or after it has finished.
Debugging and Connecting to Jobs via SSH
Jobs can be optionally configured to allow SSH connections from a specified range of IPs, and to hold the execution environment for debugging when certain types of errors happen (debug hold).
For more information, see Connecting to jobs via SSH.
Code Interpreters
Apps and applets can be interpreted by "bash", "python3" interpreters.
python3
The Python 3 interpreter makes it easy to write apps in Python.
Entry Points
To choose entry points in your Python script, simply decorate the functions with @dxpy.entry_point("entry_point_name")
. The following code snippet demonstrates when each part of your script will be run.
import dxpy
@dxpy.entry_point("myfunc")
def myfunc():
# Gets run when you make a /job/new API call with "function" set to "myfunc"
pass
@dxpy.entry_point("main")
def main():
# Gets run when you make an /app(let)-xxxx/run API call OR
# a /job/new API call with "function" set to "main"
pass
# The following line will call the appropriate entry point.
dxpy.run()
Job Input
While the job's input will always be provided in the file job_input.json
, the Python interpreter will also provide the key-value pairs as keyword arguments when calling your entry points.
Exception Handling
If your app throws dxpy.AppError
, then the interpreter will report the job failure with failure reason AppError
. In general, this error should be used for errors resulting from invalid user input. If your app throws an exception of any other class, the job will report the failure as AppInternalError
.
Bash
This is the general-purpose interpreter which you can use to run whatever shell commands and/or executables you may have packaged together with your app or applet.
Entry Points
To create multiple entry points for your bash executable, simply create bash functions with the same name as your entry point. The following code snippet demonstrates when each part of your script will be run.
# Anything outside the function declarations is always run
myfunc() {
# Gets run when you make a /job/new API call with "function" set to "myfunc"
}
main() {
# Gets run when you make an /app(let)-xxxx/run API call OR
# a /job/new API call with "function" set to "main"
}
Job Input
While the job's input will always be provided in the file job_input.json
, the bash interpreter will also set an environment variable for each key in the job input with value equal to the key's value. Case is preserved.
Exception Handling
Your bash script is interpreted with the -e
flag set, so if any command exits with a nonzero exit code, your app will fail at that point with failure reason AppInternalError
. To report an error with a more helpful error message, you must first write to the file job_error.json
before letting a command exit with a nonzero exit code.
Available Resources
Computational Power and Memory
Default machine sizes vary by region. Below is the default mapping per region. For more precise specifications see Instance Types
aws:us-east-1
mem2_hdd2_x2
aws:ap-southeast-2
mem2_hdd2_x2
aws:eu-central-1
mem1_ssd1_x4
azure:westus
azure:mem2_ssd1_x1
azure:westeurope
azure:mem2_ssd1_x1
If you need more computational resources for your app, you can request a different machine instance type in the runSpec.systemRequirements.instanceType
field of your dxapp.json
.
Choosing an Application Execution Environment
To specify the Application Execution Environment, specify runSpec.distribution
, runSpec.release
and runSpec.version
fields in your dxapp.json
using the values in the table below:
"Ubuntu", "24.04", "0"
python3, bash
"Ubuntu", "20.04", "0"
python3, bash
Network Access
Networking is pre-configured in the execution environment. Network access is restricted by default and must be requested explicitly using the access.network
field of your dxapp.json
file, or /applet/new
or /app/new
. For example, use {"access": {"network": ["*"]}}
to request unrestricted access.
When network access is restricted, the following are disabled:
Outgoing communication
DNS resolution (except for domains for services that remain available, as listed below)
Access to DBClusters
The following remain available when network access is restricted:
Access to the DNAnexus API server
Access to DNAnexus project data
The ability to install Ubuntu packages from both official Ubuntu and DNAnexus repositories
The ability to SSH into the job
The ability to HTTPS into the job, if it is an httpsApp job
Communication between cluster nodes
Thrift
The DNAnexus Platform Metastore
The Platform Vizserver
Snowflake
Software Packages
DNAnexus Utilities
The contents of the DNAnexus toolkit are available in the container, and environment variables such as PATH
, PYTHONPATH
, PERL5LIB
, and other language-specific paths are automatically set before your app runs, so that you can run utilities from the SDK simply as dx
and other commands, as well as import the bindings libraries in scripting languages.
External Utilities
If your program relies on packages that must be present in the system to run, you can specify them in the dxapp.json
(or directly in the Run Specification input to /app/new
) like so:
{ "runSpec": {
"execDepends": [
{"name": "samtools"},
{"name": "bedtools", "version": "2.16.1-1", "stages": ["work"]},
{"name": "dx-toolkit",
"package_manager": "git",
"url": "https://github.com/dnanexus/dx-toolkit.git",
"tag": "master",
"destdir": "/opt/dx-toolkit",
"build_commands": "make install DESTDIR=/ PREFIX=/opt/dnanexus"},
{"name": "pysam",
"package_manager": "pip",
"version": "0.7.4"},
{"name": "Bio::SeqIO",
"package_manager": "cpan",
"version": "1.0b3"},
{"name": "bio",
"package_manager": "gem",
"version": "1.4.3"},
{"name": "plyr",
"package_manager": "cran",
"version": "1.8.1"},
{"name": "ggplot2",
"package_manager": "cran"}
]
...
},
...
}
Here, the first dependency is an APT package. The second dependency is also an APT package, but specifies a particular version and limits the entry points (referred to as stages
in this context) to install the dependency for to just the "work" entry point. (by default, dependencies are installed for all entry points). The third dependency instructs the system to fetch directly from a Git repository, and the rest are dependencies for language-specific package managers:
NOTE: the requested APT packages will be installed but their "Recommends" will not be installed. You can simulate this behavior with
apt-get install --no-install-recommends PACKAGES ...
on an Ubuntu system.NOTE:: To access any repository other than APT, your app or applet must request network access to the repository's host by adding an entry like
"access": {"network": ["*"]}}
to itsdxapp.json
metadata.
External APT Repositories
Loading APT packages in your execDepends
only works for packages that are part of the default Ubuntu repositories.
To install a package from a third-party repository, you need to configure the repository at the beginning of your app code.
Configure APT to use the desired repository.
Bypass the Execution Environment's built-in APT caching proxy.
Ensure your app has sufficient network permissions.
See the external_apt_repo_example
app in the dx-toolkit distribution, which shows all the steps in action and demonstrates installing a package from an external APT repository. (App code and dxapp.json
)
Git-Specific Arguments
The following arguments are supported in execution dependencies where package_manager
is set to git
:
url
string (required): The URL pointing to the git repository.tag
string (optional): The tag to check out from the repository. Defaults to the default tag of the remote.destdir
string (optional): The directory to check the repository out into. It will be created if not present. Defaults to a temporary directory created bymktemp
.build_commands
string (optional): Arbitrary shell commands to run on completing the checkout, for example, "configure && make && make install".stages
array of strings (optional): Same meaning as in other dependency specifications.
Packaging Your Own Resources
You can include dependencies that aren't available as packages by bundling them with your executable using the bundledDepends
field in the run specification. These dependencies can be any type of data objects, such as files or applets, will be available in the temporary workspace when your job runs. Any files included in these bundled dependencies will be automatically downloaded to the root directory of the execution environment and unpacked if they are compressed.
If you build your executable using the DNAnexus build utility, the tool will automatically archive and upload any files from the resources
directory in your source tree. It will then add this new file object to the bundledDepends
list on the DNAnexus Platform. When your executable runs, files will be placed in the same directory structure you used in the resources folder. For example, if you placed a file at MyApp/resources/usr/bin/analyze-dna
, it will be available as /usr/bin/analyze-dna
in the execution environment. Note that files from the resources/
subdirectory are unpacked into the root filesystem, not the working directory where your executable starts.
Using Application Resource Containers
Application resource containers are platform objects that enclose static or temporary data belonging to the application. Containers behave like projects. The Platform automatically creates three types of containers for apps (only the temporary workspace is available when running applets):
Created whenever an app or applet is run. Used for inputs/outputs
workspace
DX_WORKSPACE_ID
✓
✓
Container in which data can be cached for future execution by the same version of an app. It is always associated with a particular project
projectCache
DX_PROJECT_CACHE_ID
✓
✗
Created during app creation, containing any resources the app requires for execution
resources
DX_RESOURCES_ID
✓
✗
For applications written in Python, methods in the dxpy.bindings.dxapp_container_functions
module provide convenience functions for accessing these workspaces.
Logging Service
Messages printed by processes running in the execution environment to their standard output and standard error streams are saved to the job log. The job log has a 4 MB size limit, past which messages will be truncated. Job logs created before June 20, 2023 have a limit of 2 MB. Job logs can be monitored in real time through the web interface or on the command line using dx watch
.
Besides logging standard output and standard error, jobs can produce custom log level messages. The valid log levels are:
DX_APP_STREAM
(default)
info (default)
STDOUT
DX_APP_STREAM
(default)
error
STDERR
DX_APP
debug
DEBUG
DX_APP
info (default)
INFO
DX_APP
warning
WARNING
DX_APP
error
ERROR
DX_APP
critical
CRITICAL
See the help for the dx-log-stream
command (dx-log-stream --help
) and the dxlog.py file in the DNAnexus SDK for more details.
Using the Python Logger Facility
When running Python programs, you can plug the Python logger facility directly into the DNAnexus logging system described above. To do so, use the following code:
import dxpy, logging
logger = logging.getLogger(__name__)
logger.addHandler(dxpy.DXLogHandler())
logger.propagate = False
logger.setLevel(logging.DEBUG)
The logger
object can then be used to log messages at or above the log level specified, for example, logger.debug("message")
.
Last updated
Was this helpful?