# Job Identity Tokens for Access to Clouds and Third-Party Services

Job identity tokens are DNAnexus-signed [JSON Web Tokens (JWT)](https://jwt.io/introduction) that establish a security-hardened and verifiable identity linked to a DNAnexus job. Many third party systems, including cloud providers such as Amazon Web Services, Microsoft Azure, and Google Cloud, can receive and validate these tokens, and exchange them for temporary cloud access credentials. A job can then use those credentials to access approved resources in those systems. This modern approach, based on interoperable standards (OpenID Connect tokens), avoids the burden of distributing and rotating long-lived cloud secrets, and allows customers to leverage their cloud provider's authentication and authorization tools to determine, on a granular level, which DNAnexus jobs can access what third-party cloud resources.

App authors can use this feature to provide secure access to the specific external resources required by each job. These can include storage buckets, serverless functions, databases, and secrets vaults. The token authentication system also works with many other services that support JWT authentication, including Salesforce, Oracle API Gateway, and HashiCorp Vault.

## Setup

### Step 1. Establish Trust Between the Platform and Your Cloud Provider

As a first step, you must establish trust between the DNAnexus Platform, as an OIDC identity provider, and the cloud service provider whose system your jobs need to access.

Each cloud provider has its own procedure for establishing trust. For AWS, consult [AWS IAM OIDC provider creation guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html), using the following parameters:

* For "Provider URL", enter `https://job-oidc.dnanexus.com`.
* For "Audience", enter any string (consisting of lowercase or uppercase letters, numbers, and the symbols `.`, `_` or `-`) that describes your use case. The value you provide is what AWS expects to encounter in job identity tokens generated by DNAnexus. This value must match the value of ["--aud" that is provided to dx-jobutil-get-identity-token](https://documentation.dnanexus.com/user/helpstrings-of-sdk-command-line-utilities#dx-jobutil-get-identity-token).

{% hint style="info" %}
Note: When using trust conditions (recommended), the value of audience does not need to be protected as a secret but it should remain private and accessible only to authorized workloads.
{% endhint %}

### Step 2. Configure Trust Conditions and Access Permissions

Next you need to define which jobs are allowed to request access credentials, by configuring **trust conditions** within your cloud provider's system. These are conditions that must be met before your cloud provider can exchange job identity tokens for credentials.

You also need to specify which resources jobs can access with the credentials they receive. This includes defining access to cloud storage buckets and other cloud services. Configure these **access permissions** within your cloud provider's system to control what resources are available to jobs after token exchange.

Each cloud provider has a different mechanism for configuring trust conditions and access permissions. On AWS, you must create a role and attach a trust policy (for trust conditions) and a permissions policy (for access permissions). For more information, see [Creating a role for a third-party identity provider](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html#idp_oidc_Create).

#### Specifying Trust Conditions

When configuring trust conditions, most clouds allow you to specify rules based on token claims.

Token claims are system fields, included in job identity tokens, that contain metadata about the job. For example, the token claim `launched_by` contains the username of the user who launched the job, while the token claim `project_id` contains the ID of the project in which the job is running. When setting up trust conditions on your cloud provider, you can specify rules such as `project_id` must match project-12345" so that your cloud provider only trusts jobs running in that project.

Job identity tokens issued by DNAnexus include a list of standard claims mandated by the OIDC and JWT specifications, as well as a list of custom DNAnexus-specific claims that capture job metadata.

The standard claims are as follows:

| Claim | Claim Type           | Description                                                                                                                                                             |
| ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `aud` | Audience             | A user-provided string denoting the audience that this token is intended for. This is typically expected to match the audience as configured in the third-party system. |
| `iss` | Issuer               | Always set to `https://job-oidc.dnanexus.com`.                                                                                                                          |
| `sub` | Subject              | A "subject" string which is assembled by concatenating user-chosen job metadata. By default, this concatenates the `launched_by` and `job_worker_ipv4` metadata.        |
| `exp` | Expires at           | The time (in seconds from epoch) when this token expires. This is always set to 5 minutes after the issuing time.                                                       |
| `iat` | Issued at            | The time (in seconds from epoch) when this token was issued.                                                                                                            |
| `jti` | JWT token Identifier | A unique identifier for this token.                                                                                                                                     |
| `nbf` | Not before           | The time (in seconds from epoch) when this token starts being valid. This is always set to the same time as the issuing time.                                           |

The additional DNAnexus-specific claims are as follows:

| Claim                     | Example Value                                                     | Meaning                                                                                                                                                                                                                                                                   |
| ------------------------- | ----------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `job_id`                  | `job-xxxx`                                                        | The id of the job that requested the identity token.                                                                                                                                                                                                                      |
| `root_execution_id`       | `analysis-xxxx`, `job-xxxx`                                       | The id of the top-level execution (analysis or job) that this job is part of.                                                                                                                                                                                             |
| `root_executable_id`      | `applet-xxxx`, `workflow-xxxx`, `app-xxxx`, `globalworkflow-xxxx` | The id of the top-level executable (applet, workflow, app, or global workflow).                                                                                                                                                                                           |
| `root_executable_name`    | `bwa`                                                             | The name of the root executable, populated ONLY when the root executable is an app or a global workflow.                                                                                                                                                                  |
| `root_executable_version` | `1.2.3`                                                           | The version identifier of the root executable, populated ONLY when the root executable is an app or a global workflow.                                                                                                                                                    |
| `executable_id`           | `applet-xxxx`, `app-xxxx`                                         | The id of the app or applet that this job is running.                                                                                                                                                                                                                     |
| `app_name`                | `app-foo`                                                         | The (prefixed with `app-`) name of the executable, populated ONLY when the executable is an app.                                                                                                                                                                          |
| `app_version`             | `1.2.3`                                                           | The version identifier of the executable, populated ONLY when the executable is an app.                                                                                                                                                                                   |
| `project_id`              | `project-xxxx`                                                    | The project id of the job.                                                                                                                                                                                                                                                |
| `bill_to`                 | `org-customer`, `user-selfbilled`                                 | The `billTo` of the job. Project `billTo`s can change while jobs are still running. This field reflects the job's `billTo` which does not change. It represents the historic `billTo` of the job at the time the root execution was created.                              |
| `launched_by`             | `user-alice`                                                      | The user entity that launched the job.                                                                                                                                                                                                                                    |
| `region`                  | `aws:eu-west-2-g`                                                 | The region of the project in which the job is running.                                                                                                                                                                                                                    |
| `job_worker_ipv4`         | `1.2.3.4`                                                         | The IPv4 address as it appears to third parties, when this job initiates a connection to any Internet endpoint – which as of today equals the outbound public IPv4 interface of the VM where the job is running. (For cluster jobs, this IP reflects each VM separately). |
| `job_try`                 | `0`                                                               | The try (attempt) of the job that requested the identity token.                                                                                                                                                                                                           |
| `kid`                     | `f47790d10`                                                       | The unique key identifier for the identity token.                                                                                                                                                                                                                         |

The following table summarizes some examples of how token claims can be leveraged to implement specific helpful conditions.

| Goal                                                     | Conditions\*                 |
| -------------------------------------------------------- | ---------------------------- |
| Only jobs launched by `user-alice`                       | `launched_by = user-alice`   |
| Only jobs running in `project-123`                       | `project_id = project-123`   |
| Only jobs running in projects billed to `org-x`          | `bill_to = org-x`            |
| Only jobs that are directly running as part of `app-abc` | `app_name = app-abc`         |
| All jobs from any step of `globalworkflow-xyz`           | `root_executable_name = xyz` |

The exact syntax for trust conditions varies across cloud providers.

## Using Job Identity Tokens from Inside Your Jobs

For a job to access cloud resources at runtime, it must first get a job identity token, then present it to a cloud service, in exchange for temporary cloud access credentials.

### Step 1. Get a Job Identity Token

To get a job identity token at runtime, your job must execute the command [dx-jobutil-get-identity-token](https://documentation.dnanexus.com/user/helpstrings-of-sdk-command-line-utilities#dx-jobutil-get-identity-token), then save its output, which is used in the next step.

To run this command, you must specify an audience, exactly as you want it to appear in the "aud" claim of the resulting token, using the `--aud` parameter. Later, when your job presents this token to a cloud provider, most cloud providers expect the audience to match whatever was used when [originally configuring trust on the cloud provider's system](#setup).

You can optionally adjust the subject ("sub") claim of the resulting token using the `--subject_claims` parameter. The subject claim is assembled by concatenating user-chosen job metadata.

* The default is `--subject_claims launched_by --subject_claims job_worker_ipv4`, resulting in a subject that looks like this: `launched_by;user-alice;job_worker_ipv4;1.2.3.4`
* To select other metadata to concatenate, specify them in one or more `--subject_claims` parameters. For example, using `--subject_claims job_id --subject_claims job_try` results in a subject that looks like this: `job_id;job-1234;job_try;0`

Depending on your cloud provider, the subject ("sub") claim may have unique significance. On AWS, the subject claim propagates as a context key (`job-oidc.dnanexus.com:sub`) in exchanged credentials when evaluating subsequent permission policies. This allows you to leverage its contents when specifying access permissions rules. (Token claims are otherwise limited to trust conditions rules, not access permission rules).

### Step 2. Exchange a Token for Cloud Access Credentials

Your script must include cloud provider-specific code that allows for the exchange of the signed JWT for cloud access credentials. Then it must use those credentials to access the cloud service provider's system.

To do this on AWS:

* Use this command to exchange the signed JWT (from Step 1 above) for a short-lived STS token:\
  \
  `aws sts assume-role-with-web-identity --web-identity-token <signed JWT token> --role-arn <arn of IAM role from AWS account> --duration-seconds 900`\
  \
  See [the AWS STS assume-role-with-web-identity documentation](https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role-with-web-identity.html) for detailed guidance.
* The output of the command above is a JSON response with a "Credentials" hash. Use values from that hash to set the following [environment variables](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html), to access AWS resources:
  * `AWS_ACCESS_KEY_ID`
  * `AWS_SECRET_ACCESS_KEY`
  * `AWS_SESSION_TOKEN`
