DNAnexus Documentation
APIDownloadsIndex of dx CommandsLegal
  • Overview
  • Getting Started
    • DNAnexus Essentials
    • Key Concepts
      • Projects
      • Organizations
      • Apps and Workflows
    • User Interface Quickstart
    • Command Line Quickstart
    • Developer Quickstart
    • Developer Tutorials
      • Bash
        • Bash Helpers
        • Distributed by Chr (sh)
        • Distributed by Region (sh)
        • SAMtools count
        • TensorBoard Example Web App
        • Git Dependency
        • Mkfifo and dx cat
        • Parallel by Region (sh)
        • Parallel xargs by Chr
        • Precompiled Binary
        • R Shiny Example Web App
      • Python
        • Dash Example Web App
        • Distributed by Region (py)
        • Parallel by Chr (py)
        • Parallel by Region (py)
        • Pysam
      • Web App(let) Tutorials
        • Dash Example Web App
        • TensorBoard Example Web App
      • Concurrent Computing Tutorials
        • Distributed
          • Distributed by Region (sh)
          • Distributed by Chr (sh)
          • Distributed by Region (py)
        • Parallel
          • Parallel by Chr (py)
          • Parallel by Region (py)
          • Parallel by Region (sh)
          • Parallel xargs by Chr
  • User
    • Login and Logout
    • Projects
      • Project Navigation
      • Path Resolution
    • Running Apps and Workflows
      • Running Apps and Applets
      • Running Workflows
      • Running Nextflow Pipelines
      • Running Batch Jobs
      • Monitoring Executions
      • Job Notifications
      • Job Lifecycle
      • Executions and Time Limits
      • Executions and Cost and Spending Limits
      • Smart Reuse (Job Reuse)
      • Apps and Workflows Glossary
      • Tools List
    • Cohort Browser
      • Chart Types
        • Row Chart
        • Histogram
        • Box Plot
        • List View
        • Grouped Box Plot
        • Stacked Row Chart
        • Scatter Plot
        • Kaplan-Meier Survival Curve
      • Locus Details Page
    • Using DXJupyterLab
      • DXJupyterLab Quickstart
      • Running DXJupyterLab
        • FreeSurfer in DXJupyterLab
      • Spark Cluster-Enabled DXJupyterLab
        • Exploring and Querying Datasets
      • Stata in DXJupyterLab
      • Running Older Versions of DXJupyterLab
      • DXJupyterLab Reference
    • Using Spark
      • Apollo Apps
      • Connect to Thrift
      • Example Applications
        • CSV Loader
        • SQL Runner
        • VCF Loader
      • VCF Preprocessing
    • Environment Variables
    • Objects
      • Describing Data Objects
      • Searching Data Objects
      • Visualizing Data
      • Filtering Objects and Jobs
      • Archiving Files
      • Relational Database Clusters
      • Symlinks
      • Uploading and Downloading Files
        • Small File Sets
          • dx upload
          • dx download
        • Batch
          • Upload Agent
          • Download Agent
    • Platform IDs
    • Organization Member Guide
    • Index of dx commands
  • Developer
    • Developing Portable Pipelines
      • dxCompiler
    • Cloud Workstation
    • Apps
      • Introduction to Building Apps
      • App Build Process
      • Advanced Applet Tutorial
      • Bash Apps
      • Python Apps
      • Spark Apps
        • Table Exporter
        • DX Spark Submit Utility
      • HTTPS Apps
        • Isolated Browsing for HTTPS Apps
      • Transitioning from Applets to Apps
      • Third Party and Community Apps
        • Community App Guidelines
        • Third Party App Style Guide
        • Third Party App Publishing Checklist
      • App Metadata
      • App Permissions
      • App Execution Environment
        • Connecting to Jobs
      • Dependency Management
        • Asset Build Process
        • Docker Images
        • Python package installation in Ubuntu 24.04 AEE
      • Job Identity Tokens for Access to Clouds and Third-Party Services
      • Enabling Web Application Users to Log In with DNAnexus Credentials
      • Types of Errors
    • Workflows
      • Importing Workflows
      • Introduction to Building Workflows
      • Building and Running Workflows
      • Workflow Build Process
      • Versioning and Publishing Global Workflows
      • Workflow Metadata
    • Ingesting Data
      • Molecular Expression Assay Loader
        • Common Errors
        • Example Usage
        • Example Input
      • Data Model Loader
        • Data Ingestion Key Steps
        • Ingestion Data Types
        • Data Files Used by the Data Model Loader
        • Troubleshooting
      • Dataset Extender
        • Using Dataset Extender
    • Dataset Management
      • Rebase Cohorts and Dashboards
      • Assay Dataset Merger
      • Clinical Dataset Merger
    • Apollo Datasets
      • Dataset Versions
      • Cohorts
    • Creating Custom Viewers
    • Client Libraries
      • Support for Python 3
    • Walkthroughs
      • Creating a Mixed Phenotypic Assay Dataset
      • Guide for Ingesting a Simple Four Table Dataset
    • DNAnexus API
      • Entity IDs
      • Protocols
      • Authentication
      • Regions
      • Nonces
      • Users
      • Organizations
      • OIDC Clients
      • Data Containers
        • Folders and Deletion
        • Cloning
        • Project API Methods
        • Project Permissions and Sharing
      • Data Object Lifecycle
        • Types
        • Object Details
        • Visibility
      • Data Object Metadata
        • Name
        • Properties
        • Tags
      • Data Object Classes
        • Records
        • Files
        • Databases
        • Drives
        • DBClusters
      • Running Analyses
        • I/O and Run Specifications
        • Instance Types
        • Job Input and Output
        • Applets and Entry Points
        • Apps
        • Workflows and Analyses
        • Global Workflows
        • Containers for Execution
      • Search
      • System Methods
      • Directory of API Methods
      • DNAnexus Service Limits
  • Administrator
    • Billing
    • Org Management
    • Single Sign-On
    • Audit Trail
    • Integrating with External Services
    • Portal Setup
    • GxP
      • Controlled Tool Access (allowed executables)
  • Science Corner
    • Scientific Guides
      • Somatic Small Variant and CNV Discovery Workflow Walkthrough
      • SAIGE GWAS Walkthrough
      • LocusZoom DNAnexus App
      • Human Reference Genomes
    • Using Hail to Analyze Genomic Data
    • Open-Source Tools by DNAnexus Scientists
    • Using IGV Locally with DNAnexus
  • Downloads
  • FAQs
    • EOL Documentation
      • Python 3 Support and Python 2 End of Life (EOL)
    • Automating Analysis Workflow
    • Backups of Customer Data
    • Developing Apps and Applets
    • Importing Data
    • Platform Uptime
    • Legal and Compliance
    • Sharing and Collaboration
    • Product Version Numbering
  • Release Notes
  • Technical Support
  • Legal
Powered by GitBook

Copyright 2025 DNAnexus

On this page
  • Example Use Cases
  • Dramatically Speed Up R&D of Workflows
  • Dramatically Reduce Costs When Testing at Scale
  • Example Reuse with WDL
  • Specific Properties
  • Enable/Disable Smart Reuse

Was this helpful?

Export as PDF
  1. User
  2. Running Apps and Workflows

Smart Reuse (Job Reuse)

Speed workflow development and reduce testing costs by reusing computational outputs.

Last updated 7 months ago

Was this helpful?

A license is required to access the Smart Reuse feature. Please contact for more information.

DNAnexus allows organizations to optionally reuse outputs of jobs that share the same executable and input IDs, even if these outputs are across projects or entire organizations. This feature has two primary use cases.

Example Use Cases

Dramatically Speed Up R&D of Workflows

For example, suppose you are developing a workflow, and at each stage, you end up debugging an issue. Let's assume that each stage takes approximately one hour to develop and run. If you do not reuse outputs as you are developing, the development process takes hours since at every stage you fix something, you have to recompute results from previous stages you were working on. On the other hand, if you simply reuse results for stages that have matured and are no longer modified, your total development time is now just the total amount of time it takes to develop and run the pipeline (in this case n hours). This is an order of magnitude difference in development time, and the improvement becomes more pronounced for longer workflows.

This feature is also powerful for saving time developing forks of existing workflows. For example, suppose you are a developer in an R&D organization and want to modify the last couple of stages of a production workflow in another organization. As long as the new workflow uses the same executable IDs for the stages before it, the time required for R&D of the forked version is only that of last stages.

Dramatically Reduce Costs When Testing at Scale

In production environments, it is important to test R&D modifications to a workflow at scale (e.g. a workflow for a clinical test). For example, suppose you are testing a workflow like the forked workflow discussed in the example above. This is a clinical workflow that needs to be tested on thousands of samples (let that number be represented by m) before being vetted to run in a production environment. Let's also suppose the whole workflow takes n hours but you only have modified the last k stages. You save (n-k)m total compute hours. This can add up to dramatic cost savings as m grows and if k is small.

Example Reuse with WDL

To demonstrate Smart Reuse, we will use WDL syntax as supported by DNAnexus through our toolkit and by .

task dupfile {
    File infile

    command { cat ${infile} ${infile} > outfile.txt  }
    output { File outfile = 'outfile.txt' }
}

task headfile {
    File infile

    command { head -10 ${infile} > outfile.txt  }
    output { File outfile = 'outfile.txt' }
}

workflow basic_reuse {
    File infile
    call dupfile { input: infile=infile }
    call headfile { input: infile=dupfile.outfile }
}

The workflow above is a two-step workflow that simply duplicates a file and takes the first 10 lines from the duplicate.

Now suppose the user has run the workflow above on some file and simply wants to tweak headfile to output the first 15 lines instead:

task dupfile {
    File infile

    command { cat ${infile} ${infile} > outfile.txt  }
    output { File outfile = 'outfile.txt' }
}

task headfile2 {
    File infile

    command { head -15 ${infile} > outfile.txt  }
    output { File outfile = 'outfile.txt' }
}

workflow basic_reuse_tweaked {
    File infile
    call dupfile { input: infile=infile }
    call headfile { input: infile=dupfile.outfile }
}

Here the only difference is that we renamed headfile, basic_reuse, and changed 10 to 15. The compilation process automatically detects that dupfile is the same but there is a different second stage. The generated workflow therefore uses the original executable ID for dupfile but a different executable ID for headfile2.

When executing basic_reuse_tweaked on the same input file with Smart Reuse enabled, the results from dupfile task are reused. This is because since there is already a job on the DNAnexus Platform that has run that specific executable with the same input file, the system can reuse that file.

Specific Properties

Smart Reuse:

  • only applies to jobs run in projects billed to an organization that has Smart Reuse enabled

  • is applied only to completed jobs executed after the policies are updated for an org

Jobs:

  • if ignoreReuse: true the job will not be considered a future candidate for job reuse.

  • the job to be reused must have all outputs intact at the time of reuse. Partial output from the job (e.g. some of the output is missing or inaccessible) will prevent the reuse.

  • contain a field called outputReusedFrom that refers to the job ID that originally computed the requested outputs. This field never refers to another job that has itself been reused

  • must have at least VIEW access to the original job's outputs, and those outputs must still exist on the Platform (i.e. they have not been since deleted)

  • are reported as having run for 0 seconds and correspondingly are billed as $0

  • are assumed to be deterministic in output

  • if the reused jobs/workflows is located in a different project or a different folder, the output data will not be cloned to the working project or the new destination folder since the new jobs/workflows are not actually run.

Enable/Disable Smart Reuse

If you are an administrator of a licensed org and want to enable Smart Reuse, run this command:

dx api org-myorg update '{"policies":{"jobReuse":true}}'

If you plan to reuse this feature across projects, you must modify all applet and app configurations with the "allProjects": "VIEW" as described above.

When using Smart Reuse with complex WDL workflows involving WDL expressions in input arguments, scatters, and nested sub-workflows, we recommend launching workflows using the option, in order to preserve the outputs of all the jobs in the execution tree in the project, and also increase the potential for subsequent Smart Reuse.

may only reuse results from other jobs if there exists a previously run job that ran with the exact same executable and input IDs (including the called within the applet). If an input is watermarked the watermark and its version must be the same as well. This therefore does not include other settings like the instance type the job was run on, for example.

may only use results across projects if the corresponding application's contains "allProjects": "VIEW" in the "access" field

Conversely, set the value to false to disable it. If you are a licensed customer and cannot run the command above, contact . If you are interested in this feature and are not a licensed customer, reach out to or your account executive for more information.

DNAnexus Sales
1 + 2 + 3 + ... + n
dxCompiler
--preserve-job-outputs
dxapp.json
support@dnanexus.com
sales@dnanexus.com
function