Job Input and Output

Input

When launching an executable, either from outside of the platform, or from another executable that is already running, an input must be provided as part of the JSON in the API call. This input is the mechanism via which objects can be passed into the executable. If the objects are not in a publicly-viewable location or in a location otherwise made accessible to a job running the executable, then the objects need to be cloned into the workspace for the job to be able to access them. For subjobs, which are jobs created via the API call /job/new, no cloning occurs because they share the same workspace as their parent jobs.

The input to a job must be a hash. The system treats this hash slightly differently, depending on whether the executable has a formally defined input specification.

If there is a formally defined input specification, the hash needs to abide by that.

The keys of the hash must correspond to names of inputs as defined in the spec.
The value for a particular key must be compatible with the class of the input as defined in the spec. The following table summarizes value compatibility:
Class
JSON syntax for value
Example
int
numeric literal
32
float
numeric literal
3.14
string
UTF-8 string
"CFTR, HOXA"
boolean
true/false
true
hash
JSON hash
{"hello": "world", "foo": true}
any data object class, such as record or file
- {"$dnanexus_link": _objectid_} - {"$dnanexus_link": {"project": _projectid_, "id": _objectid_} }
- {"$dnanexus_link": "file-B2QkQvyK8yjQ48y890400012"} - {"$dnanexus_link": {"project": "project-B1350vfK8yjjKz3q00q00001", "id": "file-B2QkQvyK8yjQ48y890400012"} }
array of a class (other than hash)
JSON array of the class and/or nested arrays of the class. The array is **flattened** before given as input to the executable
- For class "array:int", - [1, 2, -4, 104] - [1, [2, -4], [[104]]] (interpreted to be the same value as above) - For class "array:file", - [{"$dnanexus_link": "file-B2QkQvyK8yjQ48y890400012"}, {"$dnanexus_link": "file-B2p0BZKK8yjg1JYyyQJQ000K"}]

Job Dependencies

When the value of an input parameter is not known at job creation time, but is determined by the output of another running job, you can use a job-based object reference. A job-based object reference is a JSON hash with keys "job", referring to a job ID of a job in the same project, and "field", referring to the key of the job's output hash whose value is expected to contain the value of the input parameter.

The target value to which a job-based object reference eventually resolves is whatever JSON value the other job provided. If it (or any other input field) does not match the input spec of the job using it as input, then the job fails with failure reason "InputError." If the input spec mandates a type constraint for a particular input, the types associated with the object given as input are also validated against the particular constraint.

Data Object Dependencies

If a data object in the input of an origin or master job is not yet in the closed state, then it cannot yet be cloned into the temporary workspace of the new job. As a result, the system waits until the data object is closed before the job can be considered runnable. This allows you to queue up jobs without having to wait or come back when input is ready. For example, you can start a file upload and run a reads importer app on the file object without having to waiting for the file to finish uploading. Once the upload is done, the job running the reads importer is marked as runnable and is assigned to a worker in the cloud.

Explicit Dependencies

When creating any new job, you can specify the dependsOn field in the API call to be an array of job and/or data object IDs. This creates an explicit dependency for the newly created job to wait for any mentioned jobs to transition to the done state and for any mentioned data object IDs (and recursively their hidden linked data objects) to transition to the closed state.

This may be useful, for example, if you have two apps called collectStats and aggregateStats. Every day, you launch some number of collectStats app executions on separate inputs, each of which write sample statistics files to the project. Once a week, you run your aggregateStats app which handles running an aggregation script to collect summary statistics for all the sample level statistics present in the project. If some collectStats jobs are running, you can still launch your aggregateStats job. By setting dependsOn to the list of running collectStats job IDs when creating the job that runs aggregateStats, the resulting job waits for the jobs it depends on to finish without the need for job-based object references in the input.

Jobs with No Input Spec

If the executable does not have an input specification or the job is a subjob, the hash given as input can contain any values. These values are only checked for the presence of DNAnexus links and job-based object references. A job does not run until its input values (or values in arrays) do not contain job-based object references that cannot be resolved nor DNAnexus links pointing to non-closed data objects.

Output

Job outputs are treated differently for the following two categories of jobs:

Outputs of jobs created via /job/new
Outputs of jobs created via running an app or applet (see /applet-xxxx/run)

In case #1, the system does not need to perform any checks for the job output, as it is only going to be consulted by other jobs within the same job group. In case #2, the system needs to validate the output (if the app or applet spec formally defines its output), and also clone the output objects into the project context or the parent workspace. In both cases the resulting job output needs to be a JSON hash.

For case #2, the behavior depends on whether the app or applet spec formally defines its outputs. If that is the case, the job output is validated against the "outputs" field of the applet spec, in a way similar to the one for the "inputs", as mentioned in the description of the "run" method. Then, any outputs whose class is a data object class are cloned into the project context or parent workspace (depending on whether the executable was launched by a user from outside of the platform or by another job). If the executable does not have an output spec, the output hash is examined for any links (which can appear anywhere, including the values of hashes or inside arrays) and these are exactly the objects that are cloned.

Job Dependencies

A job's output can include job-based object references that point to results from other jobs. These references are resolved when the referenced jobs finish running. When a job's output contains these references, the system places it in the waiting_on_output state until all references are resolved.

Before a job can reach the done state, it must also wait for all its child jobs to complete. This means a parent job (origin or master job) only reaches the done state after all its descendant jobs are also done.

For more details on job states, see Job Lifecycle.

Data Object Dependencies

A job can output a data object that is in the closing state. It is not marked as done until its data object outputs have transitioned to the done state and job-based object references have also been resolved.

Output Validation Failure Reasons

When job output is validated, a job may fail with failure reason "OutputError" for the following reasons (this list is non-exhaustive):

A job-based object reference in the job output did not resolve successfully (invalid job, job ID not found, job not viewable by the job, job is in either the failed or terminated state, field does not exist, field does not contain a valid object link).
For case #2 (here "output object" refers to objects pointed to by links in the output):
- The JSON output of the job is not a hash.
- An output object does not exist in the workspace.
- An output object is not a data object (things like users, projects, or jobs are not data objects)
- For applets that formally define their output:
  - An output object does not satisfy the class constraints.
  - An output object does not satisfy the type constraints.
  - The names of outputs do not match exactly those defined in the spec.
- An output object is not in the "closing" or "closed" state.

Special Values

DNAnexus links are JSON hashes with a particular syntax that are recognized by the platform. They are used to refer to data objects and to the output fields of jobs. To avoid unexpected behavior, you should not reuse the syntax of these special values in your input and output if you do not wish the platform to treat them differently and potentially overwrite their values.

Data Object Links

Data objects in the system are represented by a mapping with the key-value pair:

$dnanexus_link string or mapping Either the data object ID, or
a mapping with the key/values:
- id string Data object ID
- project string ID of the project or container in which to find the data object

If the project ID is not provided and the object is not found in the project from which the job is launched, then the API server searches for the object. If it can find a copy of the object for which the user has VIEW permissions, and the project containing the copy is not RESTRICTED, it uses that copy.

For example:

{
  "$dnanexus_link": "file-BFY5vKKgqZk89Vzk0Zj00GQb"
}
{
  "$dnanexus_link": {
    "id": "file-BFY5vKKgqZk89Vzk0Zj00GQb",
    "project": "project-B3387KygqZk2YQ12Zjf00001"
  }
}

Analysis and Job Based Object References

Analysis and job-based object references are references to values, but instead of directly specifying the value, they refer to the output of another job or analysis that may still be in progress. They can be specified in the input or in the output of another job or analysis.

When an analysis or job-based object reference is placed in the input of a job, the job remains in the "waiting_on_input" state until all references have been resolved. Once all its inputs are ready, the system transitions the job to the "runnable" state, at which point it is placed in the queue to be run in the cloud.

When a job finishes running and reports an analysis or job-based object reference in its output, the job is placed in the "waiting_on_output" state. The job is marked as "done" only after all its outputs are ready and any executions it has launched are also done.

Job-Based Object References

Job-based object references are mappings with the key/values:

$dnanexus_link mapping Mapping with the key/values:
- job string Job ID
- field string Output field name

Deprecated syntax: job-based object references are also recognized if they are provided as the inner mapping with just the two keys job and field.

For example:

{
  "$dnanexus_link": {
    "job": "job-BFY5vKKgqZk89Vzk0Zj00GQb",
    "field": "mappings"
  }
}

Analysis References

Analysis references are links which act much like job-based object references but refer instead to the output of an analysis.

An analysis reference to its output can be specified as a mapping with the key-value pair:

$dnanexus_link mapping Mapping with the key/values:
- analysis string Analysis ID
- field string The output field name. This is one of the following:
  - stage ID and the stage's output field name, separated by a dot ("."), that is <stage ID>.<stage output field>
  - the exported output field name in the workflow that was run (see the section on customizing workflow IO specifications for more information)

Examples:

{
  "$dnanexus_link": {
    "analysis": "analysis-BFY5vKKgqZk89Vzk0Zj00GQb",
    "field": "stage-BFY5yq6gqZkF9GGZbkq02Vj6.mappings"
  }
}
{
  "$dnanexus_link": {
    "analysis": "analysis-BFY5vKKgqZk89Vzk0Zj00GQb",
    "field": "mappings"
  }
}

Analysis Stage Reference

The output of a an analysis's stage can also be referenced explicitly using a mapping with the key-value pair:

$dnanexus_link mapping Mapping with the key/values:
- analysis string Analysis ID
- stage string Stage ID
- field string Output field name of the stage

For example:

{
  "$dnanexus_link": {
    "analysis": "analysis-BFY5vKKgqZk89Vzk0Zj00GQb",
    "stage": "stage-BFY5yq6gqZkF9GGZbkq02Vj6",
    "field": "mappings"
  }
}

Array Index References

Sometimes the output of an existing job is in the form of an array of some class of output, and it is desirable to provide a reference to just one of the members of this array when launching another job. In such a situation, the analysis or job-based object reference can be augmented with the integer index field to indicate the element that is referenced.

The syntax for the reference is a mapping with key-value pair:

$dnanexus_link mapping Mapping with the key/values:
- The key-value pairs that are indicated by the analysis or job-based object reference syntax. See above.
- index integer Index of the array to be linked. This is 0-indexed, so a value of 0 indicates the first element should be used

For example:

{
  "$dnanexus_link": {
    "analysis": "analysis-BFY5vKKgqZk89Vzk0Zj00GQb",
    "stage": "stage-BFY5yq6gqZkF9GGZbkq02Vj6",
    "field": "array_of_mappings",
    "index": 1
  }
}

Symbolic Metadata References

Links can also be used to refer to metadata stored in a particular data object. Both data object links and execution output references can be augmented in this way. A symbolic metadata reference is a mapping with the key-value pair:

$dnanexus_link mapping Mapping with key/values:
- The key-value pairs that are indicated by the link to be augmented. See above for data object and execution reference syntax. See the next Examples section for how to link to another field in the input.
- metadata string A restricted JavaScript-style syntax for accessing values within hashes and arrays. For more details, see Key Notation. The three types of metadata that can be accessed are as follows:
Name: using the value "name" resolves the link to the string name of the data object.
Properties: using a value starting with "properties" and specifying a single key resolves the link to the string property value specified by the property key given.
Details: using a value starting with "details" resolves the link to either the entire JSON details of the data object, or to a value stored within it if any keys are provided.

The following three ways to reference an object are available:

Data object ID: you can refer to a data object by ID, optionally providing a project ID as well.

{
"$dnanexus_link": {
 "id": "record-B65KqzygqZk7KvKK7VgQ00gp",
 "metadata": "name"
}
}
{
"$dnanexus_link": {
 "id": "record-B65KqzygqZk7KvKK7VgQ00gp",
 "project": "project-B3387KygqZk2YQ12Zjf00001",
 "metadata": "properties.foo"
}
}

Input field: Besides the existing link types, you can also refer to a data object provided in another field of the input. For example, if there are two input fields "genome" and "indexed_genome", you can provide a link for "genome" which resolves to the genome object that was used to create the indexed genome object.
```
{
"$dnanexus_link": {
 "input": "indexed_genome",
 "metadata": "details.genome"
}
}
```

Job-based object reference or analysis reference: you can combine this syntax to with that of a job-based object or analysis reference so that the metadata is resolved once the relevant execution finishes.

{
"$dnanexus_link": {
 "job": "job-BFxzqYfgqZkFxQyZ1QVQ001k",
 "field": "output_field",
 "metadata": "details.genome"
}
}
{
"$dnanexus_link": {
 "analysis": "analysis-BFxzq7ygqZk74yg420VQ0005",
 "field": "output_field",
 "metadata": "details.genome"
}
}

When a symbolic reference cannot be resolved, the default value (if provided for an input) is used instead.

Key Notation

Two notations are available to specify a key: dot notation and bracket notation.

Dot notation: keys are restricted and must not include characters that are not alphanumeric but can also use the characters "$" and "_".
Bracket notation: any key can be represented, but are restricted in syntax to either integer keys or a single quoted string.

Below are valid and invalid examples for values for the metadata key.

Valid:

name
properties.foo
properties["foo"]
properties['foo']
properties['f"o"o']
properties["hello world!"]
details.some_array[2]
details.genome.$dnanexus_link

Invalid:

name[0]
properties[foo]
properties['foo' + 'bar']
properties.hello world!
details.thing[3.4]
details.foo\.bar
details.foo\ bar

Last updated 25 days ago

Was this helpful?