Smart Reuse (Job Reuse)

Speed workflow development and reduce testing costs by reusing computational outputs.

A license is required to access the Smart Reuse feature. Contact DNAnexus Sales for more information.

DNAnexus allows organizations to optionally reuse outputs of jobs that share the same executable and input IDs, even if these outputs are across projects or entire organizations. This feature has two primary use cases.

Example Use Cases

Dramatically Speed Up R&D of Workflows

For example, suppose you are developing a workflow, and at each stage you end up debugging an issue. Each stage takes about one hour to develop and run. If you do not reuse outputs during development, the process takes 1 + 2 + 3 + ... + n hours because at every stage you fix something and must recompute results from previous stages. By reusing results for stages that have matured and are no longer modified, the total development time equals the time it takes to develop and run the pipeline (in this case n hours). This is an order-of-magnitude reduction in development time, and the improvement becomes more pronounced for longer workflows.

This feature also saves time when developing forks of existing workflows. For example, suppose you are a developer in an R&D organization and want to modify the last couple of stages of a production workflow in another organization. As long as the new workflow uses the same executable IDs for the earlier stages, the time required for R&D of the forked version equals the time for the last stages.

Dramatically Reduce Costs When Testing at Scale

In production environments, test R&D modifications to a workflow at scale. This is especially relevant for workflows used in clinical tests. For example, suppose you are testing a workflow like the forked workflow discussed earlier. This clinical workflow must be tested on thousands of samples (let that number be represented by m) before it is vetted for production. Suppose the whole workflow takes n hours but only the last k stages changed. You save (n-k)m total compute hours. This can add up to dramatic cost savings as m grows and if k is small.

Example Reuse with WDL

To show Smart Reuse, the following example uses WDL syntax as supported by DNAnexus SDK and dxCompiler.

task dupfile {
    File infile

    command { cat ${infile} ${infile} > outfile.txt  }
    output { File outfile = 'outfile.txt' }
}

task headfile {
    File infile

    command { head -10 ${infile} > outfile.txt  }
    output { File outfile = 'outfile.txt' }
}

workflow basic_reuse {
    File infile
    call dupfile { input: infile=infile }
    call headfile { input: infile=dupfile.outfile }
}

The workflow above is a two-step workflow that duplicates a file and takes the first 10 lines from the duplicate.

Suppose the user has run the workflow above on some file and wants to tweak headfile to output the first 15 lines instead:

Here the only differences are the renamed headfile and basic_reuse, and the change from 10 to 15. The compilation process automatically detects that dupfile is the same but the second stage differs. The generated workflow therefore uses the original executable ID for dupfile but a different executable ID for headfile2.

When executing basic_reuse_tweaked on the same input file with Smart Reuse enabled, the results from dupfile task are reused. This is because since there is already a job on the DNAnexus Platform that has run that specific executable with the same input file, the system can reuse that file.

When using Smart Reuse with complex WDL workflows involving WDL expressions in input arguments, scatters, and nested sub-workflows, we recommend launching workflows using the --preserve-job-outputs option. This preserves the outputs of all jobs in the execution tree in the project and increases the potential for subsequent Smart Reuse.

Requirements for Smart Reuse

Jobs can reuse results from previous jobs if the following criteria are met:

  • The organization that is billed for the job has Smart Reuse enabled.

  • Smart Reuse applies only to jobs completed after the org policy was enabled.

  • Smart Reuse is enabled at the executable level (ignoreReuse in dxapp.json).

  • Smart Reuse is enabled at runtime (not using the --ignore-reuse flag).

  • A previous job used the exact same executable and input IDs (including the function called within the applet).

  • If an input is watermarked, both the watermark and its version match. Other settings, such as instance type, do not affect reuse.

  • The job being reused has all outputs available and accessible at the time of reuse.

  • You have at least VIEW access to the previous job's outputs.

  • The previous job's outputs still exist on the Platform.

  • For cross-project reuse, the application's dxapp.json file includes "allProjects": "VIEW" in the "access" field.

  • Outputs are assumed to be deterministic.

When a job reuses results, it includes an outputReusedFrom field pointing to the previous job ID. Reused jobs are reported as having run for 0 seconds and are billed at $0. If the reused job or workflow is in a different project or folder, output data is not cloned to the new project or destination folder (the job or workflow is not actually rerun).

Controlling Smart Reuse

Smart Reuse can be controlled at three levels. Runtime settings override executable defaults. Executable defaults override organization policy. If Smart Reuse is disabled at any level, reuse does not occur.

  1. Organization policy: Set the jobReuse policy to true (default is false). See How to Enable Smart Reuse.

  2. Executable default: Set ignoreReuse in dxapp.json to true or false. The default is false, allowing reuse. When ignoreReuse: true, Smart Reuse is disabled for the executable.

  3. Runtime override: Control reuse at runtime using any of these methods:

    • Use the --ignore-reuse flag with dx run to disable reuse.

    • Use --extra-args '{"ignoreReuse": false}' or --extra-args '{"ignoreReuse": true}' to explicitly enable or disable reuse.

    • Set the ignoreReuse parameter in API calls to /app-xxxx/run, /applet-xxxx/run, or /workflow-xxxx/run.

    • For workflows, use --ignore-reuse-stage STAGE_ID to control specific stages.

How to Enable Smart Reuse

To enable or disable Smart Reuse for your organization:

If you plan to reuse results across projects, you must modify all applet and app configurations to include "allProjects": "VIEW" in the "access" field.

If you are a licensed customer and cannot run the command above, contact DNAnexus Support. If you are interested in Smart Reuse and are not a licensed customer, reach out to DNAnexus Sales or your account executive for more information.

Last updated

Was this helpful?