Upload Agent

Introduction

The DNAnexus Upload Agent is a fast and convenient command-line client that can be used to upload files to DNAnexus. For uploading multiple or large files, Upload Agent is particularly recommended due to its ability to resume previously interrupted uploads.

Installing Upload Agent: Follow the instructions on the Upload Agent download page to download the Upload Agent executable.

For the rest of this document, ua represents the Upload Agent executable, but you should replace it with the path to where you have saved the Upload Agent executable on your local file system.

Basic Usage

Synopsis

./ua [options]  [...]

Usage

The following examples assume that you have set your environment variables, specifically, the authentication token and current workspace (project).

You can always override these environment variables by using the --auth-token and --project command-line options.

Default Compression Behavior: By default, Upload Agent automatically compresses uncompressed files before uploading and appends .gz to the filename. This improves upload efficiency and reduces storage costs.

To see the current environment variables being used by ua, run:

$ ./ua --env
  API server protocol: https
  API server host:   api.dnanexus.com
  API server port:    ---
  Auth token: <TOKEN>
  Current Project: my_project (project-xxxx)

Running a Diagnostic Test

Running the Upload Agent with the --test flag runs a test to verify that ua is correctly configured. The output of a successful configuration looks similar to the output below. Upload Agent prints any errors as part of the output.

Uploading a Single File

You can upload a single file using the Upload Agent. The following example shows how to upload a local file named my_file.txt to the project called my_project.

By default, uncompressed files are automatically compressed during upload, so when you view the uploaded file it is named my-file.txt.gz.

Uploading Multiple Files to the Same Project

You can upload multiple files to the same project. In the following example, two local files, my_file_1.txt and my_file_2.txt, are uploaded to the project my_project. By default, uncompressed files are automatically compressed during upload.

File IDs output in the same command-line input order. In the above example, the first and second lines correspond to the new file IDs generated by uploading my_file_1.txt and my_file_2.txt, respectively.

Uploading Directories

You can upload all the files in a given directory. By default, uncompressed files are automatically compressed during upload.

The destination of the files depends on the directory name given as input. If the name contains a trailing /, Upload Agent doesn't create the directory, it copies the contents of the folder to the destination path in the platform.

Without a trailing /, a new remote directory is created and the files are uploaded to the new directory (dir_name).

You can upload at most 1000 files in a single operation.

Uploading Directories Recursively

You can upload a directory recursively using the --recursive flag. The destination directory follows the same rules as above. With a trailing /, ua assumes that the destination directory exists. Without the trailing /, Upload Agent creates a new directory, if the directory doesn't exist.

Uploading Data from stdin

You can upload data from stdin directly into a file by using the --read-from-stdin flag. With this flag, you can upload only a single file. This can be useful when you need to pipe output from a program and upload it as a file.

This command reads data interactively from the terminal until the stream is terminated with <CTRL>+D, which represents the end of the file (EOF).

Redirecting Uploaded Files

Redirecting to a Folder

You can change the final path of the file in the project via the flags --folder and --name. The following command uploads my_file_1.txt into the folder called oldData and renames it to file_1. Due to the default automatic compression, the final filename on the Platform is file_1.gz.

Automatic Compression

By default, Upload Agent automatically compresses uncompressed files before uploading them and appends .gz to the filename. This compression improves upload efficiency and reduces storage costs for text-based files, such as FASTA, FASTQ, and CSV files.

Disabling Compression

To upload files without compression, use the --do-not-compress flag. This preserves the original filename and content without any modification.

Files that are already compressed (for example, .gz, .bz2, .zip) are never recompressed during upload.

Preventing the Resumption of Previous Uploads

By default, Upload Agent attempts to resume all the uploads it can. In the case, where you would like to upload the same file twice, you can override this behavior with the --do-not-resume flag.

In the situation where the Upload Agent fails to upload a file, or has partially uploaded a file, resume the upload by specifying the same command again. When resuming an upload, a file signature is generated using the following information:

  • size

  • modifiedTimestamp

  • toCompress (boolean whether the file was uploaded original with --do-not-compress)

  • chunkSize

  • the canonical path to the file

This information is summarized as a metadata field on the file object. When you upload a file using Upload Agent, it quickly calculates this file signature and searches your current project for any file with the same signature. If it finds such an object, and if the file upload is incomplete, it tries to resume the upload. If the file upload is complete, then the file signature is added as a property.

Waiting for a File to Close

When scripting, the ua command can wait until uploaded files are in the closed state before proceeding to the next command by using the --wait-on-close flag. You do not have to wait for a file to be closed to give it as input to app or applet, as the platform automatically waits for the file to be closed before starting the job. However, if you would like to copy a file between projects, then you must wait for it to be in the closed state.

Monitoring Upload Progress

You can turn on progress reporting (printed to stderr) with the --progress flag.

Uploading Files With Metadata

Details

Assigning File Details

Upload Agent can set details for a file using the --details flag. The details must be passed as a valid JSON string. For more information about JSON, see the Wikipedia page on JSON.

Assigning Details to Multiple Files

The following command sets the same details to all the files being uploaded.

Assigning Different Details to Multiple Files

You need to provide one --details flag per file uploaded.

Properties

Upload Agent can assign properties to a file during upload using the --property flag.

Assigning a Property to a Single File

Assigning Multiple Properties to a Single File

Advanced Usage

Changing the Number of Threads

You can specify a different number of threads for compression and a different number of outgoing HTTPS connections to be opened to upload the file chunks by using the flags --compress-threads and --upload-threads, respectively. The number of threads used to read the input files can be changed by the --read-threads flag.

For example, if you are uploading some files from a eight-core machine, we recommend that you limit the usage to 75% of the machine's capabilities as a safety measure and evenly divide the usage amongst the three options. As a result, the number of cores for reading the input data (--read-threads), compressing (--compress-threads) and uploading (--upload-threads) the files would be two each. The command would look something like this:

Using a Different Chunk Size

You can change the chunk size that is uploaded at a time in each thread using the flag --chunk-size. This parameter depends on the memory available on the machine. We recommend that you keep the default value. However, if your network connection is particularly slow, use a smaller chunk size.

The following command splits up large-file.txt into chunks of size 200MB (209,715,200 bytes) each to be uploaded. By default, the chunk size is ~95MB (100,000,000 bytes). Upload Agent has a maximum limit of 10,000 chunks.

Setting Files as Hidden

By default, Upload Agent sets all files as visible. You can override this behavior with the --visibility flag.

Help String

The number of compress threads is optimized and would depend on the system used.

Specification

Output

On successful completion, the file IDs of the newly created remote files are printed to standard output (each on a new line). If a particular file upload was unsuccessful, then the string "Failed" is printed instead of the file ID. The lines are printed in same order as the files specified on command line for upload.

Errors

In case an error occurs, Upload Agent does not exit immediately. Instead, all other files are still uploaded and the program exits with a non-zero status code, printing "Failed" instead of the file ID of the failed uploads.

Non-Zero Error Code

The program exits with a non-zero error code if any of the following errors occur:

  • A valid authentication token was not provided.

  • A connection to the API server could not be made.

  • A file to be uploaded does not exist or is not accessible.

  • If --do-not-resume is not set and the user tries to upload the same file to a project more than once.

  • An unknown command line option or illegal value for an option is provided.

  • The project is not specified, the specified project does not exist, or the authentication token provided does not allow CONTRIBUTE access to the specified project.

  • The project specifier cannot be unambiguously resolved, for example, if two or more projects match the given project name.

  • A folder or file object could not be created.

  • A file could not be closed. This occurs when the /file-xxxx/close API call fails.

  • An error occurs while compressing a chunk, for example, the machine ran out of memory.

File Not Fully Uploaded

A file may not be fully uploaded if any of the following errors occur:

  • If the same local file has been uploaded to a project more than once (either partially or fully) and --do-not-resume is not set, Upload Agent may not be able to determine which remote file to resume. In this case, the upload may not complete.

  • A chunk fails to upload after the specified number of retry attempts.

  • A file could not be closed because one of the chunks was compressed below the 5MB limit. In this case, you should try uploading the failed file with either the --do-not-compress option, or by setting a larger --chunk-size.

Last updated

Was this helpful?