Versioning and Publishing Global Workflows

Learn how to create both a local, project-based workflow, and a versioned, global workflow that can be published and listed in the Platform Tools library.

Workflows and Global Workflows

Workflows created on the DNAnexus Platform with an ID workflow-xxxx are data objects stored in a project. Since they are stored in a project, it is easy to share the workflow with other users by adding them to the project. Such "local" workflows are also great for fast iterations for development and testing. It is convenient that we can always delete the workflow and create a new one in its place.

A global workflow is an executable that can be versioned and published to other users. It is implemented as a wrapper around an existing project-based workflow. For users and organizations collaborating on multiple private or public projects, these local workflows may be less suitable for long-term maintenance and collaboration. For example, an organization administrator or workflow developer may want to restrict execution of a particular workflow to specific users and organizations, or they may want to allow execution of this workflow across cloud regions and providers. Global Workflows can be used for those use cases and more.

Global Workflow Use Cases

  • Version management: to maintain the source code of the workflow across versions

  • Provenance: global workflows maintain an explicit history of changes for the workflow name and its associated ID. The user can always revert to a previous version and versions are immutable.

  • Sharing: a global workflow can be maintained by a set of developers across projects and organizations and this workflow can be shared to a separate set of users who cannot modify the workflow, but who can run it.

  • Multi-region support: to maintain one executable that can be run across multiple regions and cloud providers.

  • Immutability guarantees: it is not possible to change/override an existing version of a given global workflow.

  • Ability to list the workflow in the DNAnexus tools library: the authorized users and developers can discover global workflows shared with them in the Tools Library.

Step 1: Build a Local Workflow

The easiest way to build a workflow is by using the web interface. You can also use the DNAnexus SDK, dx-toolkit.

Step 2: Make the Workflow Global

Note that to create a global workflow, you must have dx-toolkit version 0.253 or higher installed. If you need to upgrade your version of dx-toolkit, you can do so using dx upgrade

Download the Original Workflow

Identify a local workflow you'd like to convert to a global workflow. Then use the dx get command to download a JSON representation (dxworkflow.json) of the local workflow:

$ dx get "Exome Analysis Demo:Exome Analysis Workflow"

This will create a directory "Exome Analysis Workflow" locally and a dxworkflow.json file in it.

Build a Global Workflow

We will use the downloaded dxworkflow.json to create a new global workflow. The only additional fields that are required for the global workflow are: name and version.

The name has to be unique in the global namespace shared by apps and global workflows. Global workflow name can have only lower case letters, numbers, "-" , "." , and "_" , and cannot have spaces.

"name": "my_global_workflow",
"title": "Exome Analysis Workflow", 
"version":"1.0.0",

We can leave the stages and any other fields unchanged.

We can add fields such as title and summary to make our workflow more user-friendly. It is also good practice to include documentation which can be placed in the Readme.md file in the same directory.

We can now generate our first global workflow as follows:

$ dx build --globalworkflow "Exome Analysis Workflow"

Because Exome Analysis Workflow is the name of the directory storing the dxworkflow.json, it can be changed freely.

Describe and Run the Workflow

The dx build command should return us the unique ID ('globalworkflow-xxxx') of the created version. We can use it to refer to the global workflow from now on, for example:

$ dx describe globalworkflow-xxxx

Running a global workflow is the same as running any other workflow. We can run our workflow for example:

$ dx run my_global_workflow \
  -i0.reads_fastqgzs="Exome Analysis Demo:/Input/SRR504516_1.fastq.gz"

Step 3: Add Authorized Users or Orgs

We can now specify a list of users with whom we want to share our workflow by using dx add users. We will prepare the list now but the users will be able to find and access the global workflow only after the workflow is published. The list of users can be updated by developers before or after it is published, and it will apply to all to the versions of the workflows, past and future.

For example, to share our workflow with a user and with an organization we will run the following command:

$ dx add users my_global_workflow user-bob org-partnerorg

Then, we can view who is on the access list for the workflow:

$ dx list users my_global_workflow

and remove a user by running:

$ dx remove users my_global_workflow org-partnerorg

The authorized user permissions do not propagate to apps automatically, so if the workflow contain any apps, the users need access to those apps in order to run the workflow. To add users to an app, the same dx add users command can be used.

Step 4: Publish a Version to Users

Once we have tested our workflow version we can release it to the authorized users by executing dx publish:

$ dx publish my_global_workflow/0.0.1

Publishing a workflow version has the following effects:

  • The authorized users can discover the version via dx find globalworkflow .

  • The authorized users can describe and run the version.

  • The authorized users can download the workflow's dxworkflow.json source code with dx get (as well as the dependencies, e.g. applets), though they cannot build new versions with the same global workflow name.

Developers can add and remove users and other developers any time. Adding users to the workflow will give the users access to all the published versions but it will have no effect on the unpublished versions. Unpublished versions are only accessible to the people listed as developers of the workflow.

The "Default" Alias

Executing dx publish command will make the published version a default one by adding a "default" alias to this version (and removing it from whichever workflow version was previously marked as "default"). The alias indicates that this version will be invoked whenever the workflow name is used without the version. For example, users can run the workflow:

$ dx run my_global_workflow   # equivalent to "dx run my_global_workflow/0.0.1"

The first created version of a global workflow gets this flag automatically. Please check dx publish --help for more options for this command.

Next Steps

Add Developers

We can enable multiple users to update current or build new versions of our global workflow, for example:

$ dx add developers my_global_workflow user-eve

Either specific users or whole orgs can be developers of a global workflow. We can see who is on the developer list as follows:

$ dx list developers my_global_workflow

To remove developers:

$ dx remove developers my_global_workflow user-bob

Create a New Version

We can create a new version of the workflow by updating the dxworkflow.json and using dx build --globalworkflow. In order to create the new version, we need to explicitly update the version string, e.g. to "0.0.2".

If we lose the source dxworkflow.json we can download it any time with dx get, for example to get version 0.0.2:

$ dx get globalworkflow-my_global_workflow/0.0.2

Any developer of the workflow can download its source code and build new versions. Authorized users that have access to published versions can only download or run it.

Delete

We can mark a global workflow version as deleted, which will make the version unrunnable. It will still be possible to describe the workflow for provenance purposes. For example:

$ dx api globalworkflow-xxxx delete

Please use this route with caution as deleting a global executable can break users' reproducibility requirements.

Deleting all workflow versions will not release the workflow name. It will not be possible to reuse the name for a different global workflow by another user.

Discovering Global Workflows

Searching Available Versions

The command dx find globalworkflows is useful for browsing global workflows that are available to us.

$ dx find globalworkflows

By default, dx find globalworkflows lists one version per each available workflow - the version marked as "default". To print the whole version history for a workflow, add the —all flag, for example:

$ dx find globalworkflows --all

The above command will print a list of published global workflows. To list the workflows that are not published:

$ dx find globalworkflows --unpublished --all

Searching by Category

The --category parameter can be used to restrict the search to workflows from a specific category. Common categories are available as tab completions. For example:

$ dx find globalworkflows --category Variation\ Calling

To view all available categories that we can search by:

$ dx find globalworkflows --category-help

Summary

In the table we summarize the main steps or stages we went through to create and publish our workflow.

Object name

ID prefix

Namespace

Access

Definition and purpose

Workflow

workflow

Project

based on project permissions

A file-like workflow object stored in a project; used for private, light-weight development

Global workflow unpublished

globalworkflow

Global

developers

A development version of a workflow that is not visible to users but is to developers; used for development and testing in multiple regions, release management, preparation for publishing to users

Global workflow published

globalworkflow

Global

users, developers

A version of a workflow that is available to authorized users

Global workflow deleted

globalworkflow

Global

users, developers

A version (previously published or unpublished) that is not runnable but can be described

Last updated

Copyright 2024 DNAnexus