Job Lifecycle
Learn about the states through which a job or analysis may go, during its lifecycle.
Example Execution Tree
In the following example, we have a workflow that has two stages, one of which is an applet, and the other of which is an app.

If the workflow is run, it will generate an analysis with an attached workspace for storing intermediate output from its stages. Jobs are also created to run the two stages. These jobs in turn can spawn more jobs, either to run another function in the same executable or to run an executable. The blue labels indicate which jobs or analyses can be described using a particular term (as defined above).

The subjob or child job of stage 1's origin job shares the same temporary workspace as its parent job. Any calls to run a new applet or app using the API methods /applet-xxxx/run or /app-xxxx/run will launch a master job that has its own separate workspace, and (by default) no visibility into its parent job's workspace.
Job States
Successful Jobs
Every successful job goes through at least the following four states: 1. idle: initial state of every new job, regardless of what API call was made to create it. 2. runnable: the job's inputs are ready, and it is not waiting for any other job to finish or data object to finish closing. 3. running: the job has been assigned to and is being run on a worker in the cloud. 4. done: the job has completed, and it is not waiting for any descendent job to finish or data object to finish closing. This is a terminal state, so no job will become a different state after transitioning to done.

Jobs may also pass through the following transitional states as part of more complicated execution patterns:
waiting_on_input (between idle and runnable): a job enters and stays in this state if at least one of the following is true:
it has an unresolved job-based object reference in its input
it has a data object input that cannot be cloned yet because it is not in the closed state or a linked hidden object is not in the closed state
it was created to wait on a list of jobs or data objects that must enter the done or closed states, respectively (see the
dependsOn
field of any API call that creates a job). Linked hidden objects are implicitly included in this list
waiting_on_output (between running and done): a job enters and stays in this state if at least one of the following is true:
it has a descendant job that has not been moved to the done state
it has an unresolved job-based object reference in its output
it is an origin or master job which has a data object (or linked hidden data object) output in the closing state

Unsuccessful Jobs
Two terminal job states exist other than the done state: terminated and failed. A job can enter either of these states from any other state except another terminal state.
Terminated Jobs
The terminated state is entered when a user has requested that the job (or another job that shares the same origin job) be terminated. For all terminated jobs, the failureReason
in their describe hash will be set to "Terminated", and the failureMessage
will indicate the user responsible for terminating the job. Only the user who launched the job or administrators of the job's project context can terminate the job.
Failed Jobs
Jobs can fail for a variety of reasons, and once a job fails, this triggers failure for all other jobs that share the same origin job. If an unrelated job not in the same job tree has a job-based object reference or otherwise depends on a failed job, then it will also fail. For more information about errors that jobs can encounter, see the Error Information page.
On the DNAnexus Platform, jobs are limited to a runtime of 30 days. Jobs running longer than 30 days will fail with JobTimeoutExceeded
error.
Restartable Jobs
Jobs can automatically restart when they encounter specific types of failures. You configure which failure types trigger restarts in the executionPolicy
of an app, applet, or workflow. Common restartable failure types include:
UnresponsiveWorker
ExecutionError
AppInternalError
JobTimeoutExceeded
How job restarts work
When a job fails for a restartable reason, the system determines where to restart based on the restartableEntryPoints
configuration:
master
setting (default): The failure propagates to the nearest master job, which then restartsall
setting: The job restarts itself directly
The system will restart a job up to the maximum number of times specified in the executionPolicy
. Once this limit is reached, the entire job tree fails.
During the restart process, jobs transition through specific states:
restartable: The job is ready to be restarted
restarted: The job attempt was restarted (a new attempt will begin)
Job try tracking
For jobs in root executions launched after July 12, 2023 00:13 UTC, the platform tracks restart attempts using a try
integer attribute:
First attempt:
try
= 0Second attempt (first restart):
try
= 1Third attempt (second restart):
try
= 2
Multiple API methods support job try operations and include try information in their responses:
/job-xxxx/describe
/job-xxxx/addTags
/job-xxxx/removeTags
/job-xxxx/setProperties
/system/findExecutions
/system/findJobs
/system/findAnalyses
When you provide a job ID without specifying a try
argument, these methods automatically refer to the most recent attempt for that job.
Additional States
For unsuccessful jobs, there are a couple more states that jobs may enter between the running state and its eventual terminal state of terminated or failed. Unsuccessful jobs starting in all other non-terminal states will be transitioned directly to the appropriate terminal state.
terminating: the transitional state when the worker in the cloud has begun terminating the job and tearing down the execution environment. Once the worker in the cloud has reported that it has terminated the job or otherwise becomes unresponsive, then the job will transition to its terminal state.
debug_hold: a job has been run with debugging options and has failed for an applicable reason, and is being held for debugging by the user. For more information about triggering this state, see the Connecting to Jobs page.

Analysis States
All analyses start in the state in_progress, and, like jobs, will end up in one of the terminal states done, failed, or terminated. The following diagram shows the state transition for all successful analyses.

If an analysis is unsuccessful, it may transition through one or more intermediate states before it reaches its terminal state:
partially_failed
: this state indicates that one or more stages in the analysis have not finished successfully, and there is at least one stage which has not transitioned to a terminal state. In this state, some stages may have already finished successfully (and entered the done state), and the remaining stages will also be allowed to finish successfully if they can.terminating
: an analysis may enter this state either via an API call where a user has terminated the analysis, or there is some failure condition under which the analysis is terminating any remaining stages. This may happen if theexecutionPolicy
for the analysis (or a stage of an analysis) had theonNonRestartableFailure
value set to "failAllStages".

Billing
In general, compute and data storage costs due to jobs that end up failing because of user error are still charged to the project in which the jobs were run. This includes errors such as InputError
and OutputError
. The same applies to terminated jobs. For internal errors of the DNAnexus Platform, such costs will not be billed.
The costs for each stage in an analysis is determined independently. If the first stage finishes successfully while a second stage fails for a system error, the first stage will still be billed, and the second will not.
Last updated
Was this helpful?