# I/O and Run Specifications

When creating an executable on DNAnexus, you need to define three key specifications that control how it works.

1. The **input specification** defines what data the executable expects to receive.
2. The **output specification** defines the results it produces.
3. The **run specification** defines the resources and instructions needed to execute it.

For applets, these specifications are optional but must be provided when you create the applet. Once created, applets enter a "closed" state where these specifications cannot be changed.

For apps, both input and output specifications are required. This requirement ensures apps can work together when one app's output matches another app's input requirements. Unlike applets, apps can be modified until they are published (see [Apps](https://documentation.dnanexus.com/developer/api/running-analyses/apps) for more details).

Both apps and applets can request additional **access privileges**, such as network connections or project permissions. However, an executable's project permissions cannot exceed those of the user who runs it.

{% hint style="info" %}
To reflect more inclusive language, DNAnexus has updated its terminology. The terms `master` and `slave` are being replaced with `driver` and `clusterWorker` in Spark documentation articles. The codebase updates are in progress. Some variable names and scripts in the code still use the older terms.
{% endhint %}

## Input Specification

The **input specification** is an array of input parameter descriptors that define the inputs that the executable expects to run. The order of inputs in the array determines the rendering order on the DNAnexus website when users configure and launch the executable. If an input specification is not provided, then an applet can be run with any input, without validation for input argument names or types. An input parameter descriptor is a hash with the following key/value pairs:

* `name` **string** (required) Name of the input. It must be unique across the inputs. Each input name must match the regular expression `^[a-zA-Z_][0-9a-zA-Z_]*$` (only letters and numbers, at least one char, does not start with a number).
* `class` **string** (required) The class of the input. This could be any data object class supported by the platform (such as "record", "file", or "applet"), in which case a link to an object ID must be provided for that input when the executable is run. Also to data object classes, the following value classes are supported:
  * "int" represents a decimal integer. A JSON number must be given
  * "float" represents a decimal floating-point number. A JSON number must be given
  * "string" represents a string. A JSON (UTF-8) string must be given
  * "boolean" represents a boolean. The JSON value true or false must be given
  * "hash" represents an arbitrary JSON object

    The input class name (other than "hash") can be preceded by the string "array:", for example, "array:file", to indicate that the input must be a nonempty JSON array of links to objects of the corresponding class.
* `optional` **boolean** (optional) Whether this input is optional. Defaults to `false`.
* `default` **input class** (optional) A value assigned to the input when omitted during executable runtime. This value also provides information about default values to users configuring and launching the executable through DNAnexus' website. If not provided, this input must be specified when launching the executable.
  * Also to literal values, [symbolic metadata references](https://documentation.dnanexus.com/developer/api/job-input-and-output#symbolic-metadata-references) can be used as default values, for example:

    ```json
        {"inputSpec": [
          {"name": "mappings_bam", "class": "file"},
          {"name": "sample_id", "class": "string", "default":
            {"$dnanexus_link": {"input": "mappings_bam",
                                "metadata": "properties.sample_id"}}
          }
        ]}
    ```
* `type` **string or mapping** (optional) A constraint on the type that the input object must be tagged with (this is valid only for inputs whose class is a data object class). If not provided, no type constraints apply. Can be provided in the following ways:
  * A string to match a single type exactly, for example, `"Reads"`. This requires the input object to have the specified type.
  * An AND condition requiring all specified types to match, for example, `{"$and": ["Reads", "PairedReads"]}`. This requires the input object to have all the specified types.
  * An OR condition requiring at least one specified type to match, for example, `{"$or": ["LetterReads", "ColorReads"]}`. This requires the input object to have at least one of the specified types.
  * Complex nested conditions. For example, to require that an object must be of type "Reads", and also have either "LetterReads" or "ColorReads" type:

    ```json
    {"$and": ["Reads", {"$or": ["LetterReads", "ColorReads"]}]}
    ```
* `patterns` **array of strings** or **mapping** (optional) If a list of strings is provided, it denotes name patterns for this input. This array is stored and returned verbatim by the API as part of the executable's description. Example: `{"patterns": ["*.sam", "*.bam"]}`. By convention, clients may use this array to filter objects displayed to the user as possible inputs for the executable and to infer compatibility between executable outputs and inputs, if no types are specified. If a mapping is provided, a broader set of filters for data objects can be specified and the interpretation of these key-value pairs is open to the client. For example, the client may use the following keys:
  * `name`: **array of strings** (optional) the specification for this field is same as the array of strings described above.
  * `pointer_class`: **string** (optional) Any [data object class](https://documentation.dnanexus.com/developer/api/introduction-to-data-object-classes). This implies that the class of this input maps to a data object. A common example is when the class is 'string'. The display shows only data objects matching the specified class.
  * `tag`: **array of strings** (optional) The display shows only data objects containing every tag in this array. This mapping indicates to the client that a data object must meet all criteria for display to the user. For example, consider this value for the `patterns` field:

    ```json
    {
      'name': ['*.sam', '*.bam'],
      'pointer_class': 'file',
      'tag': ['foo', 'bar']
    }
    ```

    This suggests to the client that only files with `.sam` or `.bam` extensions as well as both tags `foo` and `bar` should be displayed.
* `suggestions` **array of input class** (optional) List of suggested possible values and/or, for data object inputs, an annotated hash providing a suggested value or a project/folder path containing possible inputs. For data object inputs, the input specification read via `/applet-xxxx/describe` returns an annotated hash. For example, a suggestion `{"$dnanexus_link": "file-xxxx"}` supplied to `/applet/new` appears in the applet description as `{"value": {"$dnanexus_link": "file-xxxx"}}`. An annotated hash has these key/value pairs:
  * `name` **string** (optional) Name for the suggestion
  * `value` **input class** (optional, required if `project` is not provided) A suggested value
  * `project` **string** (optional, required if `path` is provided and if `value` is not) ID of the project containing suggested inputs
  * `path` **string** (optional, can only be present if `project` is also present) Folder path inside `project` in which to suggest inputs
  * `region` **string** (this field is not accepted when specifying an input specification to an API call, but it may be returned when making an API call that returns an input specification) Region of the specified project or data object. This field is set if and only if this is a data object input. Clients are advised to use this field to restrict the list of suggestions displayed to the user to only show values that are in the same region as the project in which the executable is being run. Below are some examples shown in an excerpt of an input specification:

    ```json
    {
      "inputSpec": [
        {
          "name": "reference",
          "class": "file",
          "suggestions": [
            {
              "name": "Reference Genomes",
              "project": "project-xxxx",
              "path": "/mm9/bwa"
            },
            {
              "name": "Modest Mouse",
              "value": {
                "$dnanexus_link": {
                  "project": "project-xxxx",
                  "id": "file-xxxx"
                }
              }
            },
            {
              "name": "Deadmau5",
              "value": {
                "$dnanexus_link": "file-xxxx"
              }
            },
            {
              "$dnanexus_link": "file-xxxx"
            }
          ]
        }
      ]
    }
    ```

    For array inputs, each element in the `suggestions` array should be a suggested value to be included in the array input. For input of class "array:int", the value of `suggestions` should therefore be an array of integers (and not an array of integer arrays).
* `choices` **array of input class** (optional) List specifying the only possible values for the input field. The annotated syntax from `suggestions` can annotate objects with names, but cannot include paths. Like `suggestions`, the `choices` value must be an array of the enclosed class, not an array of arrays, when the input class is an array. At runtime, the system accepts only arrays containing elements from the `choices` array.
* `label` **string** (optional) Human-readable label for the input field to use in place of `name` for display purposes
* `group` **string** (optional) Name of the settings group for this input field, used to organize fields under different UI tabs or sections. Without this value, the input parameter belongs to the first, default group. Groups appear in order of first occurrence in the input specification, with the unnamed group first.
* `help` **string** (optional) A longer description of the input field

## Output Specification

The **output specification** is an array of output descriptors that describe the outputs that an executable produces. If an output specification is not given for an applet, it can produce any output without validation. An output descriptor is a hash supporting the same fields as the fields of an entry in the `inputs` array described earlier, except the following fields:

* `default`
* `suggestions`
* `choices`

The specification of the `patterns` field is the same as the input. In the case the field is a mapping, this suggests to the client that the output field is a data object meeting the criteria described in the `patterns` field.

## Run Specification

The **run specification** specifies how to run the executable and is a hash with the following key/values:

* `code` **string** (required) Source code of the executable (the script to execute).
* `systemRequirements` **mapping** (optional) Request specific resources for each of the executable's stages.
  * **key** — Entry point name of the executable, or `"*"` to indicate all entry points that are otherwise not present as keys. Use "main" for the initial entry point, or the function name run in [`/job/new`](https://documentation.dnanexus.com/developer/api/applets-and-entry-points#api-method-job-new) for a subjob.
  * **value** **mapping** — Requested resources for the stage with key/values:
    * `instanceType` **string** (optional) A string specifying the instance type for setting up an [Execution Environment](https://documentation.dnanexus.com/developer/apps/execution-environment) to run your job. See [Instance Types](https://documentation.dnanexus.com/developer/api/running-analyses/instance-types) for a list of possible values. Defaults to `"mem2_hdd2_x2"`.
    * `instanceTypeSelector` **mapping** (optional) Enables dynamic instance type selection for apps and applets billed to organizations with the `instanceTypeSelector` license feature. This option is mutually exclusive with `instanceType` and `clusterSpec`. When specified, the system uses the following algorithm to assign jobs to instance types:

      * **Instance provisioning:** When a job is ready to be assigned to a worker, the system attempts to assign the job to the first instance in the `allowedInstanceTypes` list. If the job is not assigned to that instance type within 10 minutes, the system attempts to assign the job to the next entry in the `allowedInstanceTypes` list. After iterating through all the instance types without successful assignment, the system doubles the wait time (to 20 minutes, then 40 minutes, and so on) and iterates over the `allowedInstanceTypes` list with the increased wait time.
      * **On-demand fallback:** For normal-priority jobs, if spot provisioning times out based on the configured spot wait time, the system attempts to provision on-demand instances. The system follows the same iteration logic, starting with the first instance type in the list and resetting the wait time to 10 minutes.
      * **Instance type transitions:** The job's `instanceType` field reflects the current instance type the system is attempting to assign while the job is in the `runnable` state. Once the job transitions to `running`, the `instanceType` field reflects the final selected instance type. The `instanceTypeTransitions` field in the job description provides the complete history of instance type selection attempts.

      This mapping contains:

      * `allowedInstanceTypes` **array of strings** (required) List of DNAnexus instance types that the job can be assigned to. The system iterates through the list in order when provisioning instances.
    * `clusterSpec` **mapping** (optional) Indicates that this job requires a cluster of instances rather than a single worker node. This mapping should contain the following key/values:
      * `type` **string** (required) The type of cluster.
        * Must be one of `"dxspark"`, `"apachespark"`, or `"generic"`.
      * `version` **string** (optional) Requested version for `dxspark` or `apachespark` clusters.
        * Must be one of `"2.4.4"`, `"3.2.3"`, or `"3.5.2"`.
      * `initialInstanceCount` **integer** (required) The number of nodes (including the driver) in the cluster. Min value is 1, indicating a cluster with no `clusterWorker` nodes.
      * `ports` **string** (optional) A comma delimited string of ports or range of ports to open between nodes in the cluster. An example: "1000, 1100-1200" would open ports 1000 and all ports between 1100 & 1200.
      * `bootstrapScript` **string** (optional) The path to the bootstrap script. The bootstrap script is run on all cluster nodes before the driver begins running the application code.
    * `fpgaDriver` **string** (optional) Specifies the FPGA driver to install on the FPGA-enabled cloud host instance before app's code execution. Accepted values depend on instance type:
      * mem3\_ssd2\_fpga1\_x24, mem3\_ssd2\_fpga2\_x48, mem3\_ssd2\_fpga8\_x192: `edico-1.4.9.2` (default),
      * mem3\_ssd2\_fpga1\_x8, mem3\_ssd2\_fpga1\_x16, mem3\_ssd2\_fpga1\_x64: `edico-1.4.2` (default), `edico-1.4.5`, and `edico-1.4.7`.
    * `nvidiaDriver` **string** (optional) Specifies the NVIDIA driver to install on the GPU-enabled cloud host instance before app's code execution. Accepted values are:
      * `R470` (default) uses the driver version [470.256.02](https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-470-256-02/index.html) and supports CUDA 11.4.
      * `R535` uses the driver version [535.247.01](https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-247-01/index.html) and supports CUDA 12.2.
* `executionPolicy` **mapping** (optional) A collection of options that govern automatic job restart on certain types of failures. This can only be set at overridden via the user-level `/executable-xxxx/run` API call (jobs cannot override this for their subjobs). Includes the following optional key/values:
  * `inheritParentRuntimeExecutionPolicy` **boolean** (optional) When true, the parent job's runtime `restartOn` value overrides the job. Defaults to `false`.
  * `restartOn` **mapping** (optional) Indicates a job restart policy.

    * **key** — A restartable failure reason (`"ExecutionError"`, `"UnresponsiveWorker"`, `"JMInternalError"`, `"AppInternalError"`, `"AppInsufficientResourceError"` (requires the [`allowInstanceUpgradeOnJobRestart`](https://documentation.dnanexus.com/getting-started/key-concepts/organizations#org-policies) org policy for automatic instance upgrade on retry), `"JobTimeoutExceeded"`, or `"SpotInstanceInterruption"`) or `"*"` to indicate all restartable failure reasons that are otherwise not present as keys.
    * **value** **integer** — Maximum number of restarts for the failure reason.

    <div data-gb-custom-block data-tag="hint" data-style="info" class="hint hint-info"><p>The <code>"*"</code> wildcard applies to an error type only if that error type is not explicitly present as a key.</p></div>
  * `maxRestarts` **integer** (optional) Non-negative integer less than 10, indicating the maximum number of times to restart the job. Defaults to `9`.
* `timeoutPolicy` **mapping** (optional) User-specified default timeout policy, configurable by entry point, for jobs running this executable. If unspecified, jobs running this executable have no default timeout policy. The default timeout policy can be overridden at runtime. If present, includes at least one of the following key-value pairs:
  * **key** — Entry point name or `"*"` to indicate all entry points not explicitly specified in `timeoutPolicy`. If an entry point name is not explicitly specified and `"*"` is not present, a job running this executable at that entry point has no default timeout.
  * **value** **mapping** — Default timeout for a job running this executable at the corresponding entry point. Includes at least one of the following key-value pairs:
    * **key** — Unit of time. One of "days", "hours", or "minutes".
    * **value** **number** — Amount of time for the corresponding time unit. Must be nonnegative. The effective default timeout is the sum of the units of time represented in this mapping. Setting the effective timeout to 0 is the same as specifying `null` for the corresponding executable at the corresponding entry point.
* `bundledDepends` **array of mappings** (optional) List of file objects (assets) placed in the execution environment before job execution. The system downloads files to "/" on the worker and unpacks compressed files (supporting at least `tar`, `gz`, and other popular formats). These files must exist in the project containing the executable or be provided as input. Each element is a mapping with the following key/values:
  * `name` **string** (required) Name of the asset.
  * `id` **mapping** (required) DNAnexus link to the asset, with key/value:
    * `$dnanexus_link` **string** (required) ID of the file.
  * `stages` **array of strings** (optional) List of the entry points of the executable for which the asset should be fetched. Use "main" for the initial entry point, or the function name run in [`/job/new`](https://documentation.dnanexus.com/developer/api/applets-and-entry-points#api-method-job-new) for a subjob. Defaults to all stages.
* `distribution` **string** (required) The Linux operating system distribution to be used to run this executable's jobs. The only recognized value is "Ubuntu".
* `release` **string** (required) The Linux operating system release to be used to run this executable's jobs. Recognized values are `24.04` and `20.04`.
* `version` **string** (required) The version of the [application execution environment](https://documentation.dnanexus.com/developer/apps/execution-environment) used to run this executable's jobs. A triplet of (`distribution`, `release`, `version`) uniquely identify an application execution environment. Recognized values of `version` are
  * `"0"` for `24.04` and `20.04` releases
* `execDepends` **array of mappings** (optional) Specify package names and versions to be installed by the system package manager or a language-specific package manager in the execution environment before the executable is run. Each element has the key/values:
  * `name` **string** (required) Name of the package to be installed.
  * `package_manager` **string** (optional) The package manager used to install this package. Defaults to `"apt"`.
    * Must be one of `"apt"`, `"pip3"`, `"pip"`, `"gem"`, `"cpan"`, `"cran"`, or `"git"`.
  * `version` **string** (optional) Version of the package to be installed. Unsupported when `package_manager` is "git".
  * `stages` **array of strings** (optional) List of the entry points of the executable for which the package should be installed. Use "main" for the initial entry point, or the function name run in `/job/new` for a subjob. Defaults to all stages.
* `restartableEntryPoints` **string** (optional) Indicates which entry points of the executable are restartable when a restartable error occurs. See the entry for `executionPolicy.restartOn` above for details on which errors are restartable. The value `"master"` indicates that only master job entry points are restartable, whereas `"all"` indicates that all entry points, including subjob entry points, are restartable. Ensure that all entry points of the executable are idempotent before specifying `"all"`. Defaults to `"master"`.
  * Must be one of `"master"` or `"all"`.
* `headJobOnDemand` **boolean** (optional, nullable) If true, the system allocates master jobs from this executable to an on-demand instance, regardless of scheduling priority. All descendent jobs (if any) inherit the scheduling priority, with instance allocations independent of this option. Override through input to `app-xxxx/run`, `applet-xxxx/run`, or `job/new`. Defaults to `null`.
* `inheritParentRestartOnPolicy` **boolean** (optional) When true, an `app-xxxx/new` or `applet-xxxx/new job` from a parent job inherits its parent's selected `excutionPolicy.restartOn` value at runtime. When false, the job retains its policy from build time. Setting this to false does not prevent a parent execution (like an analysis) from overriding the job's `restartOn` policy. Defaults to `false`.

These fields are only returned in the responses of /app-xxxx\[/yyyy]/describe and /applet-xxxx/describe. They should not be specified as input to /applet/new:

* `systemRequirementsByRegion` **mapping** Contains the specific resources requested for each of the executable's stages, by region. See `systemRequirements` above for more information regarding requesting resources for an executable's stages.
  * **key** — Name of an enabled region of the executable. For applets, there should only be one key, because they are single-region.
  * **value** **mapping** — Specifies the resources requested for each of the app's stages when the app runs in the corresponding region. The syntax of this mapping is like that of `systemRequirements` above.
* `bundledDependsByRegion` **mapping** Contains the bundled dependencies of the executable (app or applet) in each region in which the executable may be run. See `bundledDepends` above for more information regarding bundled dependencies.
  * **key** — Name of the region in which this executable may be run, such as `aws:us-east-1`.
  * **value** **array of strings** — Array of bundled dependencies for the executable in the corresponding region.

Two run specifications are equivalent **if and only if** all their fields are equal, except for the following fields:

* `systemRequirementsByRegion`
* `systemRequirements`
* `bundledDependsByRegion`
* `bundledDepends`

## Access Requirements

Executables can request additional network access, or additional DNAnexus permissions in the form of project permissions or the ability to perform app development-related tasks on the behalf of the user. App development-related tasks include those of creating, modifying, and publishing apps, as well as adding and removing authorized developers for a particular app. To request additional access, provide a mapping with the following key/values:

* `network` **array of strings** (optional) List of hostnames, hostname wildcards, network masks, and/or the string `"*"` (for unrestricted access) specifying which Internet hosts, domains, or subnets the executable can access. If omitted, the executable has no network access, [except to the API server and selected other resources](https://documentation.dnanexus.com/apps/execution-environment#network-access).
* `project` **string** (optional) A string from "VIEW", "UPLOAD", "CONTRIBUTE", and "ADMINISTER". The executable receives this permissions level in the project it was launched in, or the user's permissions level, whichever is lower. By default, applets but not apps have VIEW access to the project they are launched from.
* `allProjects` **string** (optional) A string from "VIEW", "UPLOAD", "CONTRIBUTE", and "ADMINISTER". The executable receives this permissions level in all projects the user has access to, or the user's permissions level, whichever is lower. If omitted, the executable has no extra permissions.
* `developer` **boolean** (optional) Whether the executable has permission to act with the same developer permissions as the user launching it.
* `projectCreation` **boolean** (optional) Whether the executable can create projects on behalf of the user. Jobs receive ADMINISTER access to projects they create, regardless of the `allProjects` value. Within the same job tree, jobs get access to projects created by any ancestor jobs, but cannot access jobs created by their own descendant jobs.
