Symlinks
Learn to use the Symlinks feature to access, work with, and modify files that are stored on an external cloud service.
Last updated
Was this helpful?
Learn to use the Symlinks feature to access, work with, and modify files that are stored on an external cloud service.
Last updated
Was this helpful?
The DNAnexus Symlinks feature enables users to link external data files on AWS S3 and Azure blob storage as objects on the platform and access such objects for any usage as though they are native DNAnexus file objects.
No storage costs are incurred when using symlinked files on the Platform. When used by jobs, symlinked files are downloaded to the Platform at runtime.
Symlinked files stored in AWS S3 or Azure blob storage are made accessible on DNAnexus via a Symlink Drive. The drive contains the necessary cloud storage credentials, and can be created by following Step 1 below.
Symlink Drives are set up via the CLI. Follow the directions below, to provide the information needed to set one up:
A name for the Symlink Drive
The cloud service (AWS or Azure) where your files are stored
The access credentials required by the service
dx api drive new '{
"name" : "<drive_name>",
"cloud" : "aws",
"credentials" : {
"accessKeyId" : "<my_aws_access_key>",
"secretAccessKey" : "<my_aws_secret_access_key>"
}
}'
dx api drive new '{
"name" : "<drive_name>",
"cloud" : "azure",
"credentials" : {
"account" : "<my_azure_storage_account_name>",
"key" : "<my_azure_storage_access_key>"
}
}'
After you've entered the appropriate command, a new drive object will be created. You'll see a confirmation message that includes the id of the new Symlink Drive, in the format drive-xxxx
.
By associating a DNAnexus Platform project with a Symlink Drive, you can both:
Have all new project files automatically uploaded to the AWS S3 bucket or Azure blob, to which the Drive links
Enable project members to work with those files
Note that "new project files" includes all of the following:
Newly created files
File outputs from jobs
Files uploaded to the project
Note that non-symlinked files cloned into a symlinked project will not be uploaded to the linked AWS S3 bucket or Azure blob.
When creating a new project via the UI, you can link it with an existing Symlink Drive by toggling the Enable Auto-Symlink in This Project setting to "On":
Next:
In the Symlink Drive field, select the drive with which the project should be linked
In the Container field, enter the name of the AWS S3 bucket or Azure blob where newly created files should be stored
Optionally, in the Prefix field, enter the name of a folder within the AWS S3 bucket or Azure blob where these files should be stored
In order to ensure that files can be saved to your AWS S3 bucket or Azure blob, you must enable CORS for that remote storage container.
Use the following JSON object when configuring CORS for the bucket:
Working with Symlinked files is largely the same as working with files that are stored on the Platform. These files can, for example, be used as inputs to apps, applets, or workflows.
If you rename a symlink on DNAnexus, this does not change the name of the file in S3 or Azure blob storage. Note that in this example, the symlink has been renamed from the original name file.txt
, to Example File
. The remote filename, as shown in the Remote Path field in the right-side info pane, remains file.txt
:
If you delete a symlink on the Platform, the file to which it points is not deleted.
If your cloud access credentials change, you must update the definition of all Symlink Drives to keep using files to which those Drives provide access.
To update a drive definition with new AWS access credentials, use the following command:
To update a drive definition with new Azure access credentials, use the following command:
No, the symlinked file will only move within the project. The change will not be mirrored in the linked S3 or Azure blob container.
The job will fail after it is unable to retrieve the source file.
Yes, you can copy a symlinked file from one project to another. This includes copying symlinked files from a symlink-enabled project to a project without this feature enabled.
Yes - egress charges will be incurred.
In this scenario, the uploaded file will overwrite, or "clobber," the file that shares its name, and only the newly uploaded file will be stored in the AWS S3 bucket or Azure blob.
This is true even if, within your project, you first renamed the symlinked file and uploaded a new file with the prior name. For example, if you upload a file named file.txt
to your DNAnexus project, the file will be automatically uploaded to your S3 or Azure blob to the specified directory. If you then rename the file on DNAnexus from file.txt
to file.old.txt
, and upload a new file to the project called file.txt
, the original file.txt
that was uploaded to S3 or Azure blob will be overwritten. However, you will still be left with file.txt
and file.old.txt
symlinks in your DNAnexus project. Trying to access the original file.old.txt
symlink will likely result in a checksum error.
If the auto-symlink feature has been enabled for a project, billing responsibility for the project cannot be transferred. Attempting to do so via API call will return a PermissionDenied error.
Note that when your cloud service access credentials change, you must update the definition of each Symlink Drive that links to the cloud service in question. See .
When creating a new project via the CLI, you can link it to a Symlink Drive by using the optional argument --default-symlink
with dx new project
. See the for details on inputs and input format.
for guidance in enabling CORS for an S3 bucket.
for general guidance on enabling CORS for an Azure blob.
For more information, see .