Backups of Customer Data
For customer data - such as data files, customer-specific apps, pipelines - DNAnexus relies on the extremely high durability of the underlying cloud storage to protect against infrastructure level failures, such as disk corruption and failures. This durability is achieved through replication of the data. If a user accidentally deletes a file, the platform does not provide an “undelete” capability. Backups of customer data for those use cases are the responsibility of the customer.
You should identify the data that must be backed up. It is probably not everything in your directory. You should set the backup scope to include data files from your sequencer, final analysis results, any applications you added to the platform and configured to the platform and your pipelines. You can decide whether the intermediate results of your pipeline can be easily reproduced and therefore probably do not need to be backed up. Files shared with you may or may not be in scope (if you have the rights to make copies of these files).
There are at least four ways to backup your files in scope:
- 1.Make a deep copy (new objectID) of the objects you want to back up into a new "backup" project, and make sure it is not accessible to anyone who can accidentally delete them. Revoke all access from this "backup" project. You might consider putting the data in a separate org. Archive these items. Archiving “moves” the files from the working directory into the archive. To test the integrity of the data archived, you unarchive and run your tests to prove your backup works. While the archive is in the same cloud region, it is probably not in the same physical location as the spinning storage. This is the simplest and least expensive option. Make sure to use a deep copy (new objectID) and now a shallow copy (same objectID), as the latter is merely a pointer and not a physically separate object.
- 2.Make full replicas of the files in scope to another cloud within your license (e.g. AWS-Frankfurt to Azure-Amsterdam). On AWS and Azure, you will pay egress charges, from the source. If you are concerned about the storage charges on the destination location, then you can archive these files in the destination location. Make sure you have the “archive” license from DNAnexus. If you're not sure whether you have this license, contact DNAnexus Sales.
- 3.Download the files and store them outside the DNAnexus account. Unlike the two options above, the security and traceability of the copied files become the responsibility of the customer. This option may be a requirement of your institution.
- 4.Treat all files on DNAnexus as work products and retain a “gold” copy locally and download pipelines and output files of runs on the DNAnexus Platform.
Note that all approaches involving making a deep copy of an object - where a new object is created - make provenance tracking a challenge. The initial record of a copied file will reference the job that creates it, but the reviewer would need to review the job log to understand to provide a linkage. Solutions include metadata in the target file pointing back to the source.