I/O and Run Specifications

When creating an executable, an input specification and output specification can be given to indicate the expected inputs and outputs, and a run specification must be given to specify what is needed to and how to run the executable. For an applet, these specifications are optional and must be supplied upon creation of the executable. Applets are created in the "closed" state, so the specifications are immutable thereafter. For apps, both the input and output specifications are required in order to promote interoperability between apps with compatible inputs and outputs. Unlike applets, apps may be modified up until they are published (see the Apps for more details on how apps behave). In addition to these specifications, apps and applets can also request additional access in the form of network access or extra project and developer permissions. Note that the project permissions given to the executable will never be greater than those of the user running the executable.

DNAnexus is working to phase out outdated terminology and change scripts using those terms to reflect more inclusive language. The terms master/slave will be replaced with driver/clusterWorker in the text of all current Spark documentation articles available on DNAnexus, and will eventually replace the older terms in the codebase. For now, variable names and scripts containing the older terms will still be used in the actual code.

Input Specification

The input specification is an array of input parameter descriptors that define the inputs that the executable expects to run. The order of inputs in the array determines the order in which they will be rendered on the DNAnexus website when a user tries to configure and launch the executable through the website. If an input specification is not provided, then an applet can be run with any input, which will not be validated for input argument names or types. An input parameter descriptor is a hash with the following key/value pairs:

  • name string Name of the input; it must be unique across the

    inputs. Each input name must match the regular expression

    ^[a-zA-Z_][0-9a-zA-Z_]*$ (only letters and numbers, at least one

    char, does not start with a number).

  • class string The class of the input. This could be any data object class supported by the platform (such as "record", "file", etc.), in which case a link to an object ID must be provided for that input when the executable is run. In addition to data object classes, the following value classes are supported:

    • "int" represents a decimal integer; a JSON number must be given

    • "float" represents a decimal floating-point number; a JSON number must be given

    • "string" represents a string; a JSON (UTF-8) string must be given

    • "boolean" represents a boolean; the JSON value true or false must be given

    • "hash" represents an arbitrary JSON object

      The input class name (other than "hash") can be preceded by the string "array:", e.g. "array:file", to indicate that the input must be a nonempty JSON array of links to objects of the corresponding class.

  • optional boolean (optional, default false) Whether this input is optional

  • default input class (optional) A value to which the input will be set if omitted when running the executable. This value is also used to provide information about default values to the user, when they attempt to configure and launch the executable through DNAnexus’ website. If not provided, then this input must be always given when launching the executable.

    • In addition to literal values, symbolic metadata references can be used as default values, for example:

          {"inputSpec": [
            {"name": "mappings_bam", "class": "file"},
            {"name": "sample_id", "class": "string", "default":
              {"$dnanexus_link": {"input": "mappings_bam",
                                  "metadata": "properties.sample_id"}}
  • type string or mapping (optional) A constraint on the type that the input object must be tagged with (this is valid only for inputs whose class is a data object class). If not provided, no type constraints will be required. Constraints are specified via the following grammar:

      constraint ::= <string>
      constraint ::= { "$and": [ constraint, constraint, … ] }
      constraint ::= { "$or": [ constraint, constraint, …] }
  • The "$and" operator means that all constraints need to be satisfied; the "$or" operator means that at least one constraint needs to be satisfied. A string represents a type that the object must have in its types. For example, to say that an object must be of type "Reads", and with either "LetterReads" or "ColorReads", specify the following constraint:

    {"$and": ["Reads", {"$or": ["LetterReads", "ColorReads"]}]}
  • patterns array of strings or mapping (optional) If a list of strings is provided, it denotes name patterns for this input. This array is stored and returned verbatim by the API as part of the executable's description. Example: {"patterns": ["*.sam", "*.bam"]}. By convention, clients may use this array to filter objects displayed to the user as possible inputs for the executable; and to infer compatibility between executable outputs and inputs, if no types are specified. If a mapping is provided, a broader set of filters for data objects can be specified and the interpretation of these key-value pairs is open to the client. For example, the client may use the following keys:

    • name: array of strings (optional) the specification for this field is identical to the array of strings described above.

    • pointer_class: string (optional) Any data object class. This could imply that the class of this input (e.g. 'string') actually maps to a data object. Only data objects matching the specified class will be displayed.

    • tag: array of strings (optional) Only data objects that contain every tag in this array will be displayed. This mapping suggests to the client that all such criteria must be met by a data object for it to be displayed to the user. For example suppose the value for the patterns field is:

      'name': ['*.sam', '*.bam'],
      'pointer_class': 'file',
      'tag': ['foo', 'bar']

      This suggests to the client that only files with '.sam' or '.bam' extensions as well as both tags 'foo' and 'bar' should be displayed.

  • suggestions array of input class (optional) List of suggested possible values and/or, for the case of data object inputs, an annotated hash which provides a suggested value or a project/folder path in which possible inputs can be found. For data object inputs, when the input specification is read back out via e.g. /applet-xxxx/describe an annotated hash will always be returned. (For example, if {"$dnanexus_link": "file-xxxx"} was supplied as a suggestion to /applet/new, when the resulting applet was described the same suggestion would be read out as {"value": {"$dnanexus_link": "file-xxxx"}}.) An annotated hash has the following key/value pairs:

    • name string (optional) Name for the suggestion

    • value input class (optional; required if project is not provided) A suggested value

    • project string (optional; required if path is provided and if value is not) ID of the project containing suggested inputs

    • path string (optional; can only be present if project is also present) Folder path inside project in which to suggest inputs

    • region string (this field is not accepted when specifying an input specification to an API call, but it may be returned when making an API call that returns an input specification) Region of the specified project or data object. This field is set if and only if this is a data object input. Clients are advised to use this field to restrict the list of suggestions displayed to the user to only show values that are in the same region as the project in which the executable is being run. Below are some examples shown in an excerpt of an input specification:

            "inputSpec": [
                "name": "reference",
                "class": "file",
                "suggestions": [
                  {"name": "Reference Genomes", "project": "project-xxxx", "path": "/mm9/bwa"},
                  {"name": "Modest Mouse","value": {"$dnanexus_link": {"project": "project-xxxx","id": "file-xxxx"}}},
                  {"name": "Deadmau5", "value": {"$dnanexus_link": "file-xxxx"}},
                  {"$dnanexus_link": "file-xxxx"}

    Note that for array inputs each element in the suggestions array should be a suggested value to be included in the array input. Thus for input of class "array:int", the value of suggestions should therefore be an array of integers (and not an array of integer arrays).

  • choices array of input class (optional) Indicate a list of the only possible values that the input field can take. The annotated syntax described under suggestions may also be used to annotate objects with a name, but a path cannot be provided. Also similar to the syntax for suggestions, the value of choices is expected to be an array of the enclosed class and not an array of arrays when the input class is an array. At runtime, the system will expect an array that is composed of elements that appear in the choices array.

  • label string (optional) Human-readable label for the input field to use in place of name for display purposes

  • group string (optional) Name for the group of settings this input field belongs to, which will be used when displaying this field in the UI under different tabs or sections. If not supplied, the input parameter will belong to the first and default group. By convention, groups will be shown in order of first appearance in the input specification with the unnamed group always appearing first.

  • help string (optional) A longer description of the input field

Output Specification

The output specification is an array of output descriptors that describe the outputs that an executable produces. If an output specification is not given for an applet, then it can produce any output, which will not be validated. An output descriptor is a hash supporting the same fields as the fields of an entry in the inputs array described earlier, except the following fields:

  • default

  • suggestions

  • choices

The specification of the patterns field is the same as the input. In the case the field is a mapping, this suggests to the client that the output field is a data object meeting the criteria described in the patterns field.

Run Specification

The run specification specifies how to run the executable and is a hash with the following key/values:

  • code string: Source code of the executable (the script that will be run).

  • systemRequirements mapping (optional): Request specific resources for each of the executable's stages.

    • key: Entry point name of the executable, or "*" to indicate all entry points that are otherwise not present as keys; use "main" for the initial entry point, or the function name run in /job/new for a subjob.

    • value mapping: Requested resources for the stage with key/values:

      • instanceType string (default "mem2_hdd2_x2"): A string specifying the instance type on which an Execution Environment will be set up to run your job. See Instance Types for a list of possible values.

      • clusterSpec mapping (optional): Indicates that this job requires a cluster of instances rather than just a single worker node. This mapping should contain the following key/values:

        • type string: The type of cluster, supported values are [dxspark, apachespark, and generic]

        • version string (optional): Requested version for dxspark or apachespark clusters. Supported values are [2.4.4, 3.2.3]

        • initialInstanceCount integer: The number of nodes (including the driver) in the cluster. Min value is 1

          indicating a cluster with no clusterWorker nodes.

        • ports string (optional): A comma delimited string of ports or range of ports to open between nodes in the cluster. An example: "1000, 1100-1200" would open ports 1000 and all ports between 1100 & 1200.

        • bootstrapScript string (optional): The path to the bootstrap script. The bootstrap script is run on all cluster nodes before the driver begins running the application code.

      • fpgaDriver string (optional): Specifies the FPGA driver that will be installed on the FPGA-enabled cloud host instance prior to execution of app's code. Accepted values are "edico-1.4.2" (installed on FPGA-enabled instances by default) ,"edico-1.4.5", and "edico-1.4.7".

  • executionPolicy mapping (optional): A collection of options that govern automatic job restart upon certain types of failures; this can only be set at overridden via the user-level /executable-xxxx/run API call (jobs cannot override this for their subjobs). Includes the following optional key/values:

    • inheritParentRuntimeExecutionPolicy boolean (optional, false when undefined): when true, the parent job’s runtime restartOn value overrides the job.

    • restartOn mapping: (optional): Indicates a job restart policy

      • key: A restartable failure reason ("ExecutionError", "UnresponsiveWorker", "JMInternalError", "AppInternalError", or "JobTimeoutExceeded") or "*" to indicate all restartable failure reasons that are otherwise not present as keys

      • value int: Maximum number of restarts for the failure reason

    • maxRestarts int (optional, default 9): Non-negative integer less than 10, indicating the maximum number of times that the job will be restarted

  • timeoutPolicy mapping (optional): User-specified default timeout policy, configurable by entry point, for jobs running this executable. If unspecified, it indicates that no jobs running this executable will have a default timeout policy. This default timeout policy may be overridden at runtime. If present, includes at least one of the following key-value pairs:

    • key: Entry point name or "*" to indicate all entry points not explicitly specified in timeoutPolicy. If an entry point name is not explicitly specified and "*" is not present, then a job running this executable at that entry point will not have a default timeout.

    • value mapping: Default timeout for a job running this executable at the corresponding entry point. Includes at least one of the following key-value pairs:

      • key: Unit of time; one of "days", "hours", or "minutes".

      • value number: Amount of time for the corresponding time unit; must be nonnegative. The effective default timeout is the sum of the units of time represented in this mapping. Note that setting the effective timeout to 0 is the same as specifying null for the corresponding executable at the corresponding entry point.

  • bundledDepends array of mappings (optional): List of file objects (assets) to be placed in the execution environment before the job is run. The files will be downloaded to "/" on the worker and unpacked if compressed (with a mechanism that supports at least tar, gz, and other popular formats). Note that these files must exist in the project in which the executable is being run or given as part of the input. Each element is a mapping with the following key/values:

    • name string: Name of the asset.

    • id mapping: DNAnexus link to the asset, with key/value:

      • $dnanexus_link string: ID of the file

    • stages array of strings: (optional, default all stages): List of the entry points of the executable for which the asset should be fetched; use "main" for the initial entry point, or the function name run in /job/new for a subjob

  • distribution string: The Linux operating system distribution to be used to run this executable's jobs. The only recognized value is "Ubuntu".

  • release string: The Linux operating system release to be used to run this executable's jobs. Recognized value 0.04

  • version string: The version of the application execution environment used to run this executable's jobs. A triplet of (distribution, release, version) uniquely identify an application execution environment. Recognized values of version are

    • "0" for 20.04 release

  • execDepends array of mappings (optional): Specify package names and versions to be installed by the system package manager or a language-specific package manager in the execution environment before the executable is run. Each element has the key/values:

    • name string: Name of the package to be installed.

    • package_manager string (optional, default "apt"): The package manager used to install this package; valid options are "apt", "pip3", "pip", "gem", "cpan", "cran", and "git"

    • version string (optional): Version of the package to be installed; unsupported when package_manager is "git"

    • stages array of strings (optional, default all stages): List of the entry points of the executable for which the package should be installed; use "main" for the initial entry point, or the function name run in /job/new for a subjob

  • restartableEntryPoints string (optional, default "master"): Indicates which entry points of the executable are restartable when a restartable error occurs; see the entry for executionPolicy.restartOn above for details on which errors are restartable. If a string, then restartableEntryPoints must be one of "master" or "all". The value "master" indicates that only master job entry points are restartable, whereas "all" indicates that all entry points, including subjob entry points, are restartable; please ensure that all entry points of the executable are idempotent before specifying "all".

  • headJobOnDemand boolean (optional, default null) If true then the master jobs run from this executable will be allocated to an on-demand instance, regardless of its scheduling priority. All of its descendent jobs (if any) inherit its scheduling priority, and their instance allocations are independent from this option. This can be overridden via an input to app-xxxx/run or applet-xxxx/run or job/new.

  • inheritParentRestartOnPolicy boolean (optional, if not set, value will default to false) If set to true, an app-xxxx/new or applet-xxxx/new job from a parent job will inherit its parents selected excutionPolicy.restartOn value at runtime. If set to false, the job will retain its policy that was set at build time. However, if this flag is set to false, this will not stop a parent execution such as an analysis from overriding the job's restartOn policy.

These fields are only returned in the responses of /app-xxxx[/yyyy]/describe and /applet-xxxx/describe. They should not be specified as input to /applet/new:

  • systemRequirementsByRegion mapping: Mapping that contains the specific resources requested for each of the executable's stages, by region. See systemRequirements above for more information regarding requesting resources for an executable's stages.

    • key: Name of an enabled region of the executable. For applets, there should only be one key, because they are single-region.

    • value mapping: A mapping specifying the the specific resources requested for each of the app's stages for when the app is run in the corresponding region. The syntax of this mapping is like that of systemRequirements above.

  • bundledDependsByRegion mapping: Mapping that contains the bundled dependencies of the executable (app or applet) in each region in which the executable may be run. See bundledDepends above for more information regarding bundled dependencies.

    • key: Name of the region (e.g., aws:us-east-1) in which this executable may be run. *value array of strings Array of bundled dependencies for the executable in the corresponding region.

Two run specifications are equivalent if and only if all of their fields are equal, with the exception of the following fields:

  • systemRequirementsByRegion

  • systemRequirements

  • bundledDependsByRegion

  • bundledDepends

Access Requirements

Executables can request additional network access, or additional DNAnexus permissions in the form of project permissions or the ability to perform app development-related tasks on the behalf of the user. App development-related tasks include those of creating, modifying, and publishing apps, as well as adding and removing authorized developers for a particular app. To request additional access, provide a mapping with the following key/values:

  • network array of strings (optional): List of hostnames, hostname wildcards, network masks, and/or the string "*" (for unrestricted access) specifying which Internet hosts, domains, or subnets the executable will be permitted to access. If omitted, the executable has no network access, except to the API server.

  • project string (optional): A string from "VIEW", "UPLOAD", "CONTRIBUTE", and "ADMINISTER". The executable will have this permissions level in the project it was launched in, or the user’s permissions level, whichever is lower. Note: by default, applets but not apps have VIEW access to the project they are launched from.

  • allProjects string (optional): A string from "VIEW", "UPLOAD", "CONTRIBUTE", and "ADMINISTER". The executable will have this permissions level in all projects the user has access to, or the user’s permissions level, whichever is lower. If omitted, the executable will have no extra permissions.

  • developer boolean (optional): Whether the executable will have permission to act with the same developer permissions as the user launching it.

  • projectCreation boolean (optional): Whether the executable will be able to create projects on behalf of the user; jobs will have ADMINISTER access to projects they create, regardless of what the value of allProjects is set to. Note that within the same job tree, jobs will receive access to projects created by any ancestor jobs, but they will not be able to access jobs created by their own descendant jobs.

Last updated

Copyright 2024 DNAnexus