Archiving Files
Learn how to archive files, a cost-effective way to retain files in accord with data-retention policies, while keeping them secure and accessible, and preserving file provenance and metadata.
A license is required to use the DNAnexus Archive Service. Contact DNAnexus Sales for more information.
The archiving feature is file-based. Users can also archive individual files, folders, or entire projects and save on storage costs. Users can also easily unarchive one or more files, folders, or projects when they need to make the data available for further analyses.
The DNAnexus Archive Service is currently available via the application program interface (API) in AWS and Microsoft Azure regions.
Overview
File Archival States
To understand the archival life cycle as well as which operations can be performed on files and how billing works, it’s helpful to understand the different file states associated with archival. A file in a project can assume one of four archival states:
Archival states | Details |
| The file is in standard storage, such as AWS S3 or Azure Blob. |
| Archival requested on the current file, but other copies of the same file are in the |
| The file is in archival storage, such as AWS S3 Glacier or Azure Blob ARCHIVE. |
| Unarchival requested on the current file. The file is in transition from archival storage to standard storage. |
Different states of a file allow different operations to the file. See the table below, for which operations can be performed based on a file’s current archival state.
Archival states | Download | Clone | Compute | Archive | Unarchive |
| Yes | Yes | Yes | Yes | No |
| No | Yes* | No | No | Yes (Cancel archive) |
| No | Yes | No | No | Yes |
| No | No | No | No | No |
* Clone operation would fail if the object is actively transitioning from archival
to archived
.
File Archival Life Cycle
When the project-xxxx/archive
API is called upon a file object, the file transitions from the live
state to the archival
state. Only when all copies of a file in all projects with the same billTo
organization are in the archival
state, does the file transition to the archived
state automatically by the platform.
Likewise, when the project-xxxx/unarchive
API is called upon a file in the archived
state, the file transitions from the archived
to the unarchiving
state. During the unarchiving
state, the file is being restored by the third-party storage platform (e.g., AWS or Azure). The unarchiving
process may take a while depending on the retrieval option selected for the specific platform. Finally, when the unarchival process is completed, and the file becomes available on standard storage, the file is transitioned to a live
state.
Archive Service Operations
The File-based Archive Service allows users who have the CONTRIBUTE
or ADMINISTER
permissions to a project to archive or unarchive files that reside in the project. Via API calls, users can archive or unarchive files, folders, or entire projects, although the archival process itself happens at the file level. The API can accept a list of up to 1000 files for archival and unarchival. When archiving or unarchiving folders or projects, the API by default will archive or unarchive all the files at the root level and those in the subfolders recursively. If you archive a folder or a project that includes filess in different states, the Service will only archive files that are in the live
state and skip files that are in other states. Likewise, if you unarchive a folder or a project that includes files in different states, the Service will only unarchive files that are in the archived
state, transition archival
files back to the live
state, skip files in other states.
Archival Billing
All the fees associated with the archival process of a file get billed to the billTo organization of the project. There are several charges associated with the archival:
Standard storage charge: The monthly storage charge for files that are located in the standard storage on the platform. The files in the live
and archival
state incur this charge. The archival
state indicates that the file is waiting to be archived or that other copies of the same file in other projects are still in the live
state, so the file is in standard storage (such as AWS S3). The standard storage charge continues to get billed until all copies of the file are requested to be archived and eventually the file is moved to archival storage and transitioned into the archived
state.
Archival storage charge: The monthly storage charge for files that are located in archival storage on the platform. Files in the archived
state incur a monthly archival storage charge.
Retrieval fee: The retrieval fee is a one-time charge at the time of unarchival based on the data volume being unarchived. Retrieval fees for third-party services can be found at:
Amazon AWS: https://aws.amazon.com/glacier/pricing
Microsoft Azure: https://azure.microsoft.com/en-us/pricing/details/storage/blobs
Early retrieval fee: Because the Archive Service is designed for long-term storage of data that are infrequently used, there is necessarily a retrieval fee associated with data that are retrieved before these long-term storage periods have been met. For AWS regions, this time period is 90 days, and for Microsoft Azure regions, this time period is 180 days. Data that are unarchived less than the minimum requirement days incur a pro-rated early retrieval charge, which is equal to the archival charge for the remaining days.
Best Practices
When using the Archive Service, we recommend the following best practices.
The Archive Service does not work on sponsored projects. If you want to archive files within a sponsored project, then you must move files into a different project or end the project sponsorship before archival.
If a file is shared in multiple projects, archiving one copy in one of the projects will only transition the file into the
archival
state, which still incurs the standard storage cost. To achieve the lower archival storage cost, you need to ensure that all copies of the file in all projects with the samebillTo
org are being archived. When all the copies of the file transition into thearchival
state, the Service automatically transitions the files from thearchival
state to thearchived
state. We recommend using theallCopies
option of the API to force archiving all the copies of the file. You must be the org ADMIN of thebillTo
org of the current project to use theallCopies
option.Refer to the following example: The
file-xxxx
has copies inproject-xxxx
,project-yyyy
, andproject-zzzz
which are sharing the samebillTo
org (org-xxxx
). You are theADMINISTER
ofproject-xxxx
, and aCONTRIBUTE
ofproject-yyyy
, but do not have any role inproject-zzzz
. You are the org ADMIN of the projectbillTo
org, and try to archive all copies of files in all projects with the samebillTo
org using /project-xxxx/archive:List all the copies of the file in the
org-xxxx
Force archiving all the copies of
file-xxxx
All copies of
file-xxxx
will be archived and transitioned into thearchived
state.
Last updated