Running Analyses

In this section, learn about the API for creating and running analyses on the DNAnexus Platform.

Jobs

The job is the unit of execution on the DNAnexus Platform. For every job, a worker is spun up in the cloud, then the job's code is downloaded to that worker and executed. The job may make API calls, perform computations, or spawn other jobs.

For more on the lifecycle of executions on the DNAnexus Platform, see the Job and Analysis Lifecycles page.

Types of Executables

Three types of executables can be run in the course of a job: applets, apps, and workflows.

Applets are data objects that reside in projects, and are the fundamental building block of all executables on the Platform. Applets contain all the data and metadata required to run a job.

Each app is an applet that's been packaged to facilitate versioning and easy sharing with other users. Like applets, each app produces a job when run. Unlike applets, apps are not data objects, and do not reside in projects.

Workflows are data objects that contain the necessary metadata for creating a pipeline of one or more apps or applets, so these can be run in a specific sequence, as a single analysis. Unlike apps and applets, each workflow produces not a single job, but rather a series of jobs that are run in the course of executing the full pipeline.

Components of an Applet

Whenever a job runs on a worker, it is running an applet, either as such, or packaged into an app.

An applet has some or all of the following components:

  • Input specification: If included, input specifications detail the characteristics of named inputs to be provided to the applet. For example, it might be specified that, for an input called "reads," a file be provided.

  • Output specification: If included, output specifications detail the characteristics of outputs generated by the applet. For example, it might be specified that, for an output field called "mappings," the applet will generate a File.

  • Code: This is the code that is actually run on the worker. The code must be bash or Python 3. The code can consist of multiple functions, or entry points. See Code Interpreters for more information on writing entry points.

  • Bundled files: If an applet requires additional files or programs that a developer has compiled - perhaps written in a different language, such as C++ - these can be bundled with the applet and made available when it is run.

  • Additional resource requirements: If the applet requires specific additional resources to run, these can be specified. These might include additional computational power or memory, software packages, additional network access, and special project permissions. For details on how to write these specifications, see the Run Specification and Access Requirements sections of the I/O and Run Specifications page.

Jobs

When an applet or app is run, a job is created, and the main function, or entry point, of the applet's code is executed on a worker node on the DNAnexus Platform. This code must be bash or Python 3, though it can spawn other Linux processes - by, for example, running executables written in other languages - to perform tasks.

The job runs in the Execution Environment, a fully capable, isolated Linux environment. The DNAnexus Platform API server is always available to the job.

Like data objects, jobs can be tagged with, and searched by, metadata.

Job Hierarchy

A job can launch other jobs by running an executable directly - for example, via the API calls /applet-xxxx/run or /app-xxxx/run - or by calling another entry point in its own executable, via the API call /job/new. Jobs launched by another job are called child jobs; the job that launched them is called the parent job. The original job created, when the user runs an applet or app, is called an origin job. A job created when a job runs an applet or app is called a master job.

For a list of job-related terms, and their definitions, see this Glossary.

Jobs can depend on each other or on data objects, so that, for example, a job might not start until other jobs are finished or certain data objects are closed. These dependencies can be implicit, via Job-based Object References provided in the input, or via the dependsOn field in the API call made to create the new job.

Project Context and Temporary Workspace

An executable is always launched from a particular project. Any child jobs descendant from the resulting origin job inherit its project context. Project context is significant in several ways.

Usage Charges

The project is billed for all usage charges resulting from the execution of both the origin job and all its child jobs.

Project Permissions

When launching an executable from within a project, a user must have "CONTRIBUTE" access to that project. This enables the origin job, when outputting data objects, to place them in the project.

By default, an applet has "VIEW" permission to the project in which it resides. An app, meanwhile, has no default project permission setting.

Applets and apps can require, in their access requirements, that they be given specific permissions to the projects of any user launching them.

Temporary Workspaces

Jobs running as part of the same executable - either an origin or master job and all its descendants - always share the same temporary workspace. This workspace is a container for objects that the executable can read from and write to, on the Platform.

Note that these temporary workspaces are distinct from the local disk that each job receives on its worker node. Jobs must explicitly upload data to the Platform in order to share it with other jobs, or deliver it as output.

Temporary workspaces behave like projects, except they cannot be explicitly created or destroyed, and their permissions are fixed. See Data Containers for more about Platform data containers. See Containers for Execution for specifics on the types of containers involved in app and applet execution.

Jobs always receive "CONTRIBUTE" permission to their temporary workspace. When provided as inputs to an executable, data objects, and all hidden objects to which they link, are cloned into the workspace before the executable begins running. If any of these objects reside in projects other than the one in which the executable is being run, the user or job launching the executable must have "VIEW" access to those other projects, and those other projects must not have the "RESTRICTED" flag set. Upon completion of the job, output objects are cloned into the project from which the executable was launched, and the workspace is destroyed.

Data Object State and Job Input and Output

A job cannot start until data objects it uses as input are ready. Since the system must clone these data objects into a job's temporary workspace, the job will not start until the state of these objects is "closed." Likewise, on the conclusion of a run, if an origin or master job is to output any objects, its state will be "waiting_on_output" until output objects have transitioned to the "closed" state.

Example: Inputs from Different Projects

Example: Chained Execution

If an applet, while running, launches an applet, then the project context is carried forward, but a new workspace is made for the launched applet. The launched applet has "VIEW" access to the original project, and "CONTRIBUTE" access to its workspace - but no access to the workspace of the applet that launched it. When the launched applet is done, any objects output by the job are cloned back into the workspace of the parent applet.

The figure below illustrates an example where Applet1 produces Object C as output, and then provides Object C as an input, when launching Applet2.

Because Applet1 was launched from Project A, both Applet1's jobs and Applet2's jobs have "VIEW" access - indicated by the black arrows - to Project A. But Applet1's jobs do not have any permissions to the temporary workspace used by Applet2's jobs; nor do Applet2's jobs have any permissions to the temporary workspace used by Applet1's jobs.

Note that if Applet2 were an app rather than an applet, its jobs would have no access to Project A. If an app launches an applet, meanwhile, the app's permissions define the maximum access level at which the applet can be run. Thus the applet, in this scenario, would have no access to the project context.

Last updated