DNAnexus Documentation
APIDownloadsIndex of dx CommandsLegal
  • Overview
  • Getting Started
    • DNAnexus Essentials
    • Key Concepts
      • Projects
      • Organizations
      • Apps and Workflows
    • User Interface Quickstart
    • Command Line Quickstart
    • Developer Quickstart
    • Developer Tutorials
      • Bash
        • Bash Helpers
        • Distributed by Chr (sh)
        • Distributed by Region (sh)
        • SAMtools count
        • TensorBoard Example Web App
        • Git Dependency
        • Mkfifo and dx cat
        • Parallel by Region (sh)
        • Parallel xargs by Chr
        • Precompiled Binary
        • R Shiny Example Web App
      • Python
        • Dash Example Web App
        • Distributed by Region (py)
        • Parallel by Chr (py)
        • Parallel by Region (py)
        • Pysam
      • Web App(let) Tutorials
        • Dash Example Web App
        • TensorBoard Example Web App
      • Concurrent Computing Tutorials
        • Distributed
          • Distributed by Region (sh)
          • Distributed by Chr (sh)
          • Distributed by Region (py)
        • Parallel
          • Parallel by Chr (py)
          • Parallel by Region (py)
          • Parallel by Region (sh)
          • Parallel xargs by Chr
  • User
    • Login and Logout
    • Projects
      • Project Navigation
      • Path Resolution
    • Running Apps and Workflows
      • Running Apps and Applets
      • Running Workflows
      • Running Nextflow Pipelines
      • Running Batch Jobs
      • Monitoring Executions
      • Job Notifications
      • Job Lifecycle
      • Executions and Time Limits
      • Executions and Cost and Spending Limits
      • Smart Reuse (Job Reuse)
      • Apps and Workflows Glossary
      • Tools List
    • Cohort Browser
      • Chart Types
        • Row Chart
        • Histogram
        • Box Plot
        • List View
        • Grouped Box Plot
        • Stacked Row Chart
        • Scatter Plot
        • Kaplan-Meier Survival Curve
      • Locus Details Page
    • Using DXJupyterLab
      • DXJupyterLab Quickstart
      • Running DXJupyterLab
        • FreeSurfer in DXJupyterLab
      • Spark Cluster-Enabled DXJupyterLab
        • Exploring and Querying Datasets
      • Stata in DXJupyterLab
      • Running Older Versions of DXJupyterLab
      • DXJupyterLab Reference
    • Using Spark
      • Apollo Apps
      • Connect to Thrift
      • Example Applications
        • CSV Loader
        • SQL Runner
        • VCF Loader
      • VCF Preprocessing
    • Environment Variables
    • Objects
      • Describing Data Objects
      • Searching Data Objects
      • Visualizing Data
      • Filtering Objects and Jobs
      • Archiving Files
      • Relational Database Clusters
      • Symlinks
      • Uploading and Downloading Files
        • Small File Sets
          • dx upload
          • dx download
        • Batch
          • Upload Agent
          • Download Agent
    • Platform IDs
    • Organization Member Guide
    • Index of dx commands
  • Developer
    • Developing Portable Pipelines
      • dxCompiler
    • Cloud Workstation
    • Apps
      • Introduction to Building Apps
      • App Build Process
      • Advanced Applet Tutorial
      • Bash Apps
      • Python Apps
      • Spark Apps
        • Table Exporter
        • DX Spark Submit Utility
      • HTTPS Apps
        • Isolated Browsing for HTTPS Apps
      • Transitioning from Applets to Apps
      • Third Party and Community Apps
        • Community App Guidelines
        • Third Party App Style Guide
        • Third Party App Publishing Checklist
      • App Metadata
      • App Permissions
      • App Execution Environment
        • Connecting to Jobs
      • Dependency Management
        • Asset Build Process
        • Docker Images
        • Python package installation in Ubuntu 24.04 AEE
      • Job Identity Tokens for Access to Clouds and Third-Party Services
      • Enabling Web Application Users to Log In with DNAnexus Credentials
      • Types of Errors
    • Workflows
      • Importing Workflows
      • Introduction to Building Workflows
      • Building and Running Workflows
      • Workflow Build Process
      • Versioning and Publishing Global Workflows
      • Workflow Metadata
    • Ingesting Data
      • Molecular Expression Assay Loader
        • Common Errors
        • Example Usage
        • Example Input
      • Data Model Loader
        • Data Ingestion Key Steps
        • Ingestion Data Types
        • Data Files Used by the Data Model Loader
        • Troubleshooting
      • Dataset Extender
        • Using Dataset Extender
    • Dataset Management
      • Rebase Cohorts and Dashboards
      • Assay Dataset Merger
      • Clinical Dataset Merger
    • Apollo Datasets
      • Dataset Versions
      • Cohorts
    • Creating Custom Viewers
    • Client Libraries
      • Support for Python 3
    • Walkthroughs
      • Creating a Mixed Phenotypic Assay Dataset
      • Guide for Ingesting a Simple Four Table Dataset
    • DNAnexus API
      • Entity IDs
      • Protocols
      • Authentication
      • Regions
      • Nonces
      • Users
      • Organizations
      • OIDC Clients
      • Data Containers
        • Folders and Deletion
        • Cloning
        • Project API Methods
        • Project Permissions and Sharing
      • Data Object Lifecycle
        • Types
        • Object Details
        • Visibility
      • Data Object Metadata
        • Name
        • Properties
        • Tags
      • Data Object Classes
        • Records
        • Files
        • Databases
        • Drives
        • DBClusters
      • Running Analyses
        • I/O and Run Specifications
        • Instance Types
        • Job Input and Output
        • Applets and Entry Points
        • Apps
        • Workflows and Analyses
        • Global Workflows
        • Containers for Execution
      • Search
      • System Methods
      • Directory of API Methods
      • DNAnexus Service Limits
  • Administrator
    • Billing
    • Org Management
    • Single Sign-On
    • Audit Trail
    • Integrating with External Services
    • Portal Setup
    • GxP
      • Controlled Tool Access (allowed executables)
  • Science Corner
    • Scientific Guides
      • Somatic Small Variant and CNV Discovery Workflow Walkthrough
      • SAIGE GWAS Walkthrough
      • LocusZoom DNAnexus App
      • Human Reference Genomes
    • Using Hail to Analyze Genomic Data
    • Open-Source Tools by DNAnexus Scientists
    • Using IGV Locally with DNAnexus
  • Downloads
  • FAQs
    • EOL Documentation
      • Python 3 Support and Python 2 End of Life (EOL)
    • Automating Analysis Workflow
    • Backups of Customer Data
    • Developing Apps and Applets
    • Importing Data
    • Platform Uptime
    • Legal and Compliance
    • Sharing and Collaboration
    • Product Version Numbering
  • Release Notes
  • Technical Support
  • Legal
Powered by GitBook

Copyright 2025 DNAnexus

On this page
  • Overview
  • File Archival States
  • File Archival Life Cycle
  • Archive Service Operations
  • Archival Billing
  • Best Practices

Was this helpful?

Export as PDF
  1. User
  2. Objects

Archiving Files

Learn how to archive files, a cost-effective way to retain files in accord with data-retention policies, while keeping them secure and accessible, and preserving file provenance and metadata.

Last updated 1 day ago

Was this helpful?

A license is required to use the DNAnexus Archive Service. Contact for more information.

Archiving in DNAnexus is file-based. You can archive individual files, folders with files, or entire projects' files and save on storage costs. You can also easily unarchive one or more files, folders, or projects when you need to make the data available for further analyses.

The DNAnexus Archive Service is available via the API in Amazon AWS and Microsoft Azure regions.

Overview

File Archival States

To understand the archival life cycle as well as which operations can be performed on files and how billing works, it’s helpful to understand the different file states associated with archival. A file in a project can assume one of four archival states:

Archival states

Details

live

The file is in standard storage, such as AWS S3 or Azure Blob.

archival

Archival requested on the current file, but other copies of the same file are in the live state in multiple projects with the same billTo entity. The file is still in standard storage.

archived

The file is in archival storage, such as AWS S3 Glacier or Azure Blob ARCHIVE.

unarchiving

Unarchival requested on the current file. The file is in transition from archival storage to standard storage.

Different states of a file allow different operations to the file. See the table below, for which operations can be performed based on a file’s current archival state.

Archival states

Download

Clone

Compute

Archive

Unarchive

live

Yes

Yes

Yes

Yes

No

archival

No

Yes*

No

No

Yes (Cancel archive)

archived

No

Yes

No

No

Yes

unarchiving

No

No

No

No

No

* Clone operation would fail if the object is actively transitioning from archival to archived.

File Archival Life Cycle

When the project-xxxx/archive API is called upon a file object, the file transitions from the live state to the archival state. Only when all copies of a file in all projects with the same billTo organization are in the archival state, does the file transition to the archived state automatically by the platform.

Likewise, when the project-xxxx/unarchive API is called upon a file in the archived state, the file transitions from the archived to the unarchiving state. During the unarchiving state, the file is being restored by the third-party storage platform (e.g., AWS or Azure). The unarchiving process may take a while depending on the retrieval option selected for the specific platform. Finally, when the unarchival process is completed, and the file becomes available on standard storage, the file is transitioned to a live state.

Archive Service Operations

The File-based Archive Service allows users who have the CONTRIBUTE or ADMINISTER permissions to a project to archive or unarchive files that reside in the project.

Using API, users can archive or unarchive files, folders, or entire projects, although the archival process itself happens at the file level. The API can accept a list of up to 1000 files for archival and unarchival.

When archiving or unarchiving folders or projects, the API by default archives or unarchives all the files at the root level and those in the subfolders recursively. If you archive a folder or a project that includes files in different states, the Service will only archive files that are in the live state and skip files that are in other states. Likewise, if you unarchive a folder or a project that includes files in different states, the Service will only unarchive files that are in the archived state, transition archival files back to the live state, skip files in other states.

Archival Billing

All the fees associated with the archival process of a file get billed to the billTo organization of the project. There are several charges associated with the archival:

Standard storage charge: The monthly storage charge for files that are located in the standard storage on the platform. The files in the live and archival state incur this charge. The archival state indicates that the file is waiting to be archived or that other copies of the same file in other projects are still in the live state, so the file is in standard storage, such as AWS S3. The standard storage charge continues to get billed until all copies of the file are requested to be archived and eventually the file is moved to archival storage and transitioned into the archived state.

Archival storage charge: The monthly storage charge for files that are located in archival storage on the platform. Files in the archived state incur a monthly archival storage charge.

Retrieval fee: The retrieval fee is a one-time charge at the time of unarchival based on the volume of data being unarchived.

Early unarchival fee: If you retrieve or delete data from archival storage before the required retention period is met, a dearchival early retrieval fee applies. This is 90 days for AWS regions and 180 days for Microsoft Azure regions. You will be charged a pro-rated fee equivalent to the archival storage charges for any remaining days within that period.

Best Practices

When using the Archive Service, we recommend the following best practices.

  • If a file is shared in multiple projects, archiving one copy in one of the projects will only transition the file into the archival state, which still incurs the standard storage cost. To achieve the lower archival storage cost, you need to ensure that all copies of the file in all projects with the same billTo org are being archived. When all the copies of the file transition into the archival state, the Service automatically transitions the files from the archival state to the archived state. We recommend using the allCopies option of the API to force archiving all the copies of the file. You must be the org ADMIN of the billTo org of the current project to use the allCopies option.

    1. List all the copies of the file in the org-xxxx .

      dx api file-xxxx listProjects '{"archivalInfoForOrg":"org-xxxx"}'
      {
      "project-xxxx": "ADMINISTER",
      "project-yyyy": "CONTRIBUTE",
      "liveProjects": [
       "project-xxxx",
       "project-yyyy",
       "project-zzzz"
      ]
      }
    2. Force archiving all the copies of file-xxxx .

      dx api project-xxxx archive '{"files": ["file-xxxx"], "allCopies": true}'
      {
      "id": "project-xxxx"
      "count": 1
      }
    3. All copies of file-xxxx will be archived and transitioned into the archived state.

The Archive Service does not work on . If you want to archive files within a sponsored project, then you must move files into a different project or end the project sponsorship before archival.

Refer to the following example: The file-xxxx has copies in project-xxxx, project-yyyy, and project-zzzz which are sharing the same billTo org (org-xxxx). You are the ADMINISTER of project-xxxx, and a CONTRIBUTE of project-yyyy, but do not have any role in project-zzzz. You are the org ADMIN of the project billTo org, and try to archive all copies of files in all projects with the same billTo org using :

DNAnexus Sales
sponsored projects
/project-xxxx/archive