Symlinks

Use Symlinks to access, work with, and modify files that are stored on an external cloud service.

circle-info

A license is required to use Symlinks. Contact DNAnexus Salesenvelope for more information.

Overview

The DNAnexus Symlinks feature enables users to link external data files on Amazon S3 and Azure Blob Storage as objects on the platform and access such objects for any usage as though they are native DNAnexus file objects.

No storage costs are incurred when using symlinked files on the Platform. When used by jobs, symlinked files are downloaded to the Platform at runtime.

circle-info

DNAnexus validates the integrity of symlinked files on the DNAnexus Platform using recorded MD5 checksums. But DNAnexus cannot control or monitor changes made to these files in a customer's cloud storage. It is each customer's responsibility to safeguard files from any modifications, removals, and security breaches, while in the customer's cloud storage.

Quickstart

Symlinked files stored in Amazon S3 or Azure Blob Storage are made accessible on DNAnexus via a Symlink Drive. The drive contains the necessary cloud storage credentials, and can be created by following Step 1 below.

To set up Symlink Drives, use the CLI to provide the following information:

  • A name for the Symlink Drive

  • The cloud service (AWS or Azure) where your files are stored

  • The access credentials required by the service

AWS

dx api drive new '{
    "name" : "<drive_name>",
    "cloud" : "aws",
    "credentials" : {
        "accessKeyId" : "<my_aws_access_key>",
        "secretAccessKey" : "<my_aws_secret_access_key>"
    }
}'

Azure

After you've entered the appropriate command, a new drive object is created. You can see a confirmation message that includes the id of the new Symlink Drive in the format drive-xxxx.

circle-info

When your cloud service access credentials change, you must update the definition of each Symlink Drive that links to the cloud service. See Updating Cloud Service Access Credentials.

By associating a DNAnexus Platform project with a Symlink Drive, you can both:

  • Have all new project files automatically uploaded to the Amazon S3 bucket or Azure Blob Storage container to which the Drive links

  • Enable project members to work with those files

"New project files" includes the following:

  • Newly created files

  • File outputs from jobs

  • Files uploaded to the project

Non-symlinked files cloned into a symlinked project are not uploaded to the linked Amazon S3 bucket or Azure Blob Storage container.

When creating a new project via the UI, you can link it with an existing Symlink Drive by toggling the Enable Auto-Symlink in This Project setting to "On":

Next:

  • In the Symlink Drive field, select the drive with which the project should be linked

  • In the Container field, enter the name of the Amazon S3 bucket or Azure Blob Storage container where newly created files should be stored

  • Optionally, in the Prefix field, enter the name of a folder within the Amazon S3 bucket or Azure Blob Storage container where these files should be stored

When creating a new project via the CLI, you can link it to a Symlink Drive by using the optional argument --default-symlink with dx new project. See manual for dx new project for details on inputs and input format.

Step 3. Enable CORS

To ensure that files can be saved to your Amazon S3 bucket or Azure Blob Storage container, you must enable Cross-Origin Resource Sharing (CORS) for that remote storage location.

Enabling CORS for an Amazon S3 bucket

Refer to Amazon documentation for guidance on enabling CORS for an S3 bucketarrow-up-right.

Use the following JSON object when configuring CORS for the bucket:

Enabling CORS for an Azure Blob Storage container

Refer to Microsoft documentation for guidance on enabling CORS for Azure Storagearrow-up-right.

Working with Symlinked Files

Working with Symlinked files is similar to working with files that are stored on the Platform. These files can, for example, be used as inputs to apps, applets, or workflows.

If you rename a symlink on DNAnexus, this does not change the name of the file in the Amazon S3 bucket or Azure Blob Storage container. In this example, the symlink has been renamed from the original name file.txt, to Example File. The remote filename, as shown in the Remote Path field in the right-side info pane, remains file.txt:

If you delete a symlink on the Platform, the file to which it points is not deleted.

Updating Cloud Service Access Credentials

If your cloud access credentials change, you must update the definition of all Symlink Drives to keep using files to which those Drives provide access.

AWS

To update a drive definition with new AWS access credentials, use the following command:

Azure

To update a drive definition with new Azure access credentials, use the following command:

Learn More

For more information, see API endpoints for working with Symlink Drives.

FAQ

What happens if I move a symlinked file from one folder to another, within a DNAnexus project? Does the file also mirror that move within the Amazon S3 bucket or Azure Blob Storage container?

No, the symlinked file only moves within the project. The change is not mirrored in the linked S3 bucket or Azure container.

What happens if I delete a symlinked file directly on Amazon S3 or Azure Blob Storage, and a job tries to access the symlinked object on DNAnexus?

The job fails after it is unable to retrieve the source file.

Can I copy a symlinked file from one project to another and still keep access?

Yes, you can copy a symlinked file from one project to another. This includes copying symlinked files from a symlink-enabled project to a project without this feature enabled.

Yes - egress charges are incurred.

In this scenario, the uploaded file overwrites, or "clobbers," the file that shares its name, and only the newly uploaded file is stored in the Amazon S3 bucket or Azure container.

This behavior applies even if, within your project, you first rename the symlinked file and then upload a new file with the prior name.

For example:

  • You upload a file named file.txt to your DNAnexus project. The file is automatically uploaded to your S3 bucket or Azure container in the specified directory.

  • You then rename the file on DNAnexus from file.txt to file.old.txt.

  • Next, you upload a new file to the project called file.txt.

In this case, the original file.txt that was uploaded to S3 or Azure is overwritten. However, you still have both file.txt and file.old.txt symlinks in your DNAnexus project. Trying to access the original file.old.txt symlink results in a checksum error.

If the auto-symlink feature has been enabled for a project, billing responsibility for the project cannot be transferred. Attempting to do so via API call returns a PermissionDenied error.

Last updated

Was this helpful?