Upload Agent

Introduction

The DNAnexus Upload Agent is a fast and convenient command-line client that can be used to upload files to DNAnexus. For uploading multiple or large files, Upload Agent is particularly recommended due to its ability to resume previously interrupted uploads.

Installing Upload Agent: Follow the instructions on the Upload Agent download page to download the Upload Agent executable.

NOTE: For the rest of this document, we will use ua to represent the Upload Agent executable, but you should replace it with the path to where you have saved the Upload Agent executable on your local file system.

Basic Usage

Synopsis

$ ./ua [options]  [...]

Usage

In the examples below, we assume that you have appropriately set your environment variables. Specifically, we assume that the authentication token and current workspace (project) are set. Remember that you can always override these environment variables by using the --auth-token and --project command-line options.

To see the current environment variables being used by ua, run:

$ ./ua --env
  API server protocol: https
  API server host:   api.dnanexus.com
  API server port:    ---
  Auth token: <TOKEN>
  Current Project: my_project (project-xxxx)

Running a Simple Diagnostic Test

Running the Upload Agent with the --test flag will run a simple test to verify that ua is correctly configured. The output of a successful configuration will look similar to the output below. Upload Agent will print any errors as part of the output.

$ ./ua --test
Upload Agent Version: 1.5.33
  git version: v0.304.0
  libboost version: 1.55.0
  libcurl version: 7.45.0
Upload Agent v1.5.33, environment info:
  API server protocol: https
  API server host:     api.dnanexus.com
  API server port:     443
Your copy of Upload Agent is up to date.
System Messages:

  There are currently no system messages.
Current User: user-xxxx
Current Project: xxxx (project-xxxx)
Proxy Settings:
  No proxy set in environment.
Operating System:
  Name:    Linux
  Release: 6.6.12-linuxkit
  Version: #1 SMP Fri Jan 19 08:53:17 UTC 2024
  Machine: x86_64
CA Certificate: /etc/ssl/certs/ca-certificates.crt
Resolving Amazon S3:
  Resolved to 54.231.130.232
Testing connection:
  Successfully contacted google.com over http: (200)
  Successfully contacted google.com over https: (200)

What to do next:
  1. Run the Upload Agent with the -v flag to get verbose output:
    ua -v --test
    ua -v <filename>
  2. Set DX_LIBCURL_VERBOSE environment to 1 and repeat the upload attempt to get libcurl debug logs:
    DX_LIBCURL_VERBOSE=1 ./ua <filename>

Uploading a Single File

You can upload a single file using the Upload Agent. In the following example, we upload a local file named my_file.txt to the project called "my_project".

$ ./ua --auth-token <TOKEN> --project my_project my_file.txt
Uploading file my_file.txt to file object file-xxxx
File "my_file.txt" was uploaded successfully. Closing…file-xxxx

Now, if the project's files are viewed from the web UI or via the command line using the dx ls command, there is a new file with the name my-file.txt.gz. Upload Agent automatically compressed the file my-file.txt and appended the .gz extension during upload.

Uploading Multiple Files to the Same Project

Upload Agent can upload multiple files to the same project. In the following example, we upload two local files, my_file_1.txt and my_file_2.txt, to the project "my_project".

$ ./ua --auth-token <TOKEN> --project my_project my_file_1.txt my_file_2.txt

Uploading file my_file_1.txt to file object file-xxxx
Uploading file my_file_2.txt to file object file-yyyy
File "my_file_1.txt" was uploaded successfully. Closing...

File "my_file_2.txt" was uploaded successfully. Closing...

file-xxxx
file-yyyy

File IDs output in the same command-line input order. In the above example, the first and second lines correspond to the new file IDs generated by uploading my_file_1.txt and my_file_2.txt, respectively.

Uploading Directories

Upload Agent can upload all the files in a given directory. The destination of the files depends on the directory name given as input. If the name contains a trailing /, the Upload Agent doesn’t create the directory, it just copies the contents of the folder to the destination path in the platform.

$ ./ua --auth-token <TOKEN>  dir_name/
Uploading file dir_name/test_file.txt to file object file-xxxx
File "dir_name/test_file.txt" was uploaded successfully. Closing...

file-xxxx

Without a trailing /, a new remote directory will be created and the files will be uploaded to the new directory (dir_name).

$  ./ua --auth-token <TOKEN>  dir_name
Uploading file dir_name/test_file.txt to file object file-xxxx
File "dir_name/test_file.txt" was uploaded successfully. Closing...

file-xxxx

NOTE: You can upload at most 1000 files in a single operation.

Uploading Directories Recursively

Upload Agent can upload a directory recursively using the --recursive flag. The destination directory follows the same rules as above; that is, with a trailing /, ua will assume that the destination directory exists, and without the trailing /, Upload Agent will create a new directory, if the directory doesn’t exist.

$  ./ua  --auth-token <TOKEN>  dir_name --recursive
Uploading file dir_name/first_file.txt to file object file-xxxx
Uploading file dir_name/second_file.txt to file object file-yyyy
Uploading file dir_name/log/log_file.txt to file object file-zzzz
File "dir_name/first_file.txt" was uploaded successfully. Closing...

File "dir_name/second_file.txt" was uploaded successfully. Closing...

File "dir_name/log/log_file.txt" was uploaded successfully. Closing...

file-xxxx
file-yyyy
file-zzzz

Help String

  -h [ --help ]                      Produce a help message
  --version                          Print the version
  -e [ --env ]                       Print environment information
  -a [ --auth-token ] arg            Specify the authentication token
  -p [ --project ] arg               Name or ID of the destination project
  -f [ --folder ] arg (=/)           Name of the destination folder
  -n [ --name ] arg                  Name of the remote file (Note: Extension
                                     ".gz" will be appended if the file is
                                     compressed before uploading)
  --visibility arg (=visible)        Use "--visibility hidden" to set the
                                     file's visibility as hidden.
  --property arg                     Key-value pair to add as a property;
                                     repeat as necessary, e.g. "--property
                                     key1=val1 --property key2=val2"
  --type arg                         Type of the data object; repeat as
                                     necessary, e.g. "--type type1 --type
                                     type2"
  --tag arg                          Tag of the data object; repeat as
                                     necessary, e.g. "--tag tag1 --tag tag2"
  --details arg                      JSON to store as details
  --recursive                        Recursively upload the directories
  --read-threads arg (=2)            Number of parallel disk read threads
  -c [ --compress-threads ] arg (=3) Number of parallel compression threads
  -u [ --upload-threads ] arg (=8)   Number of parallel upload threads
  -s [ --chunk-size ] arg (=75M)     Size of chunks in which the file should be
                                     uploaded. Specify an integer size in bytes
                                     or append optional units (B, K, M, G).
                                     E.g., '50M' sets chunk size to 50
                                     megabytes.
  --throttle arg                     Limit maximum upload speed. Specify an
                                     integer to set speed in bytes/second or
                                     append optional units (B, K, M, G). E.g.,
                                     '3M' limits upload speed to 3
                                     megabytes/second. If not set, uploads are
                                     not throttled.
  -r [ --tries ] arg (=3)            Number of tries to upload each chunk
  --do-not-compress                  Do not compress file(s) before upload
  -g [ --progress ]                  Report upload progress
  -v [ --verbose ]                   Verbose logging
  --wait-on-close                    Wait for file objects to be closed before
                                     exiting
  --do-not-resume                    Do not attempt to resume any incomplete
                                     uploads
  --test                             Test upload agent settings
  -i [ --read-from-stdin ]           Read file content from stdin

The number of compress threads is optimized and would depend on the system used.

Uploading Data from stdin

Upload Agent can upload data from stdin directly into a file. Note that when this option is used, only one file can be created. This option is very useful to pipe output from a program and upload it as a file.

$ my_application | ua --read-from-stdin my_file.txt
File "my_file.txt" was uploaded successfully. Closing...
file-xxxx

This command will read data interactively from the terminal until the stream is terminated with <CTRL>+D, which represents the end of the file (EOF).

$ ./ua --read-from-stdin my_file.txt
> hello
> world
> <CTRL>+D< # EOF

Redirecting Uploaded Files

Redirecting to a Folder

You can change the final path of the file in the project via the flags --folder and --name. The following command uploads my_file_1.txt into the folder called oldData and behaves as if the file had been called file_1 (the new file name is file_1.gz, after compression).

$ ./ua --folder "/oldData" --name "file_1" my_file_1.txt
Uploading file my-file-1.txt to file object file-xxxx
File "my-file-1.txt" was uploaded successfully. Closing...

file-xxxx

Turning Off Automatic Compression

By default, Upload Agent compresses all the previously uncompressed file(s) before uploading and appends .gz to the end of the file's name. You can override this behavior with the --do-not-compress flag.

$ ./ua --do-not-compress uncompressed.txt
Uploading file uncompressed.txt to file object file-xxxx
File "uncompressed.txt" was uploaded successfully. Closing...

file-xxxx

Preventing the Resumption of Previous Uploads

By default, Upload Agent attempts to resume all the uploads it can. In the case, where you would like to upload the same file twice, you can override this behavior with the --do-not-resume flag.

$ ./ua --do-not-resume dont_resume.txt
Uploading file dont_resume.txt to file object file-xxxx
File "dont_resume.txt" was uploaded successfully. Closing...

file-xxxx

In the situation where the Upload Agent fails to upload a file, or has partially uploaded a file, we can resume the upload by specifying the same command again. When resuming an upload, we generate a file signature using the following information:

  • size

  • modifiedTimestamp

  • toCompress (boolean whether the file was uploaded original with --do-not-compress)

  • chunkSize

  • the canonical path to the file

This information is summarized as a metadata field on the file object. When you upload a file using Upload Agent, it will quickly calculate this file signature and search your current project for any file with the same signature. If it finds such an object, and if the file upload is incomplete, it will try to resume the upload. If the file upload is complete, then the file signature is added as a property.

Waiting for a File to Close

When scripting, the ua command can wait until uploaded files are in the closed state before proceeding to the next command by using the --wait-on-close flag. You do not have to wait for a file to be closed to give it as input to app or applet, as the platform will automatically wait for the file to be closed before starting the job. However, if you would like to copy a file between projects, then you must wait for it to be in the closed state.

$ ./ua --wait-on-close close_me.txt
Uploading file close_me.txt to file object file-xxxx
File "close_me.txt" was uploaded successfully. Closing...

file-xxxx

Monitoring Upload Progress

You can turn on progress reporting (printed to stderr) with the --progress flag.

$ ./ua --progress large_file.txt

Uploading file large-file.txt to file object file-xxxx
large-file.txt 75.12% complete ... Average transfer speed =   3.58 MB/sec ... Instantaneous transfer speed =   3.77 MB/sec

Uploading Files With Metadata

Details

Assigning File Details

Upload Agent can set details for a file using the --details flag. The details must be passed as a valid JSON string. For more information about JSON, see the Wikipedia page on JSON.

$ ./ua myfile.txt --details '{"Field 1": [1,2,3], "Field 2": "content"}'
Uploading file myfile.txt to file object file-xxxx
File "myfile.txt" was uploaded successfully. Closing...

File-xxxx

Assigning Details to Multiple Files

The following command will set the same details to all the files being uploaded.

$ ./ua first_file.txt  second_file.txt --details '{"Field 1": [1,2,3], "Field 2": "content"}'
Uploading file first_file.txt to file object file-xxxx
Uploading file second_file.txt to file object file-yyyy
File "first_file.txt" was uploaded successfully. Closing...

File "second_file.txt" was uploaded successfully. Closing…

file-xxxx
file-yyyy

Assigning Different Details to Multiple Files

$ ./ua first_file.txt  second_file.txt --details '{"Field 1": [1,2,3], "Field 2": "content"}' --details '{"Field 3": [A,B,C], "Field 4": "content"}'
Uploading file first_file.txt to file object file-xxxx
Uploading file second_file.txt to file object file-yyyy
File "first_file.txt" was uploaded successfully. Closing...

File "second_file.txt" was uploaded successfully. Closing…

file-xxxx
file-yyyy

You need to provide one --details flag per file uploaded.

Properties

Upload Agent can assign properties to a file during upload using the --property flag.

Assigning a Property to a Single File

$ ./ua myfile.txt --property key=value
Uploading file myfile.txt to file object file-xxxx
File "myfile.txt" was uploaded successfully. Closing...

file-xxxx

Assigning Multiple Properties to a Single File

$ ./ua myfile.txt --property property1=my_property --property property2="another property"
Uploading file myfile.txt to file object file-xxxx
File "myfile.txt" was uploaded successfully. Closing...

file-xxxx

Advanced Usage

Changing the Number of Threads

You can specify a different number of threads for compression and a different number of outgoing HTTPS connections that will be opened to upload the file chunks by using the flags --compress-threads and --upload-threads, respectively. The number of threads used to read the input files can be changed by the --read-threads flag.

For example, if you are uploading some files from a eight-core machine, we recommend that you limit the usage to 75% of the machine's capabilities as a safety measure and evenly divide the usage amongst the three options. So, the number of cores for reading the input data (--read-threads), compressing (--compress-threads) and uploading (--upload-threads) the files would be two each. The command would look something like this:

 $ ./ua --compress-threads 2 --upload-threads 2 --read-threads 2 10 file.txt
Uploading file file.txt to file object file-xxxx
File "file.txt" was uploaded successfully. Closing...

file-xxxx

Using a Different Chunk Size

You can change the chunk size that is uploaded at a time in each thread using the flag --chunk-size. This parameter is dependent on the memory available on the machine. We highly recommend that you keep the default value. However, if your network connection is particularly slow, we suggest that you use a smaller chunk size.

The following command splits up large-file.txt into chunks of size 200MB (209,715,200 bytes) each to be uploaded. By default, the chunk size is ~95MB (100,000,000 bytes). We have a maximum limit of 10,000 chunks.

$ ./ua --chunk-size 209715200 large-file.txt
Uploading file large-file.txt to file object file-xxxx
File "large-file.txt" was uploaded successfully. Closing...

file-xxxx

Setting Files as Hidden

By default, Upload Agent sets all files as visible. You can override this behavior with the --visibility flag.

$ ./ua myfile.txt --visibility hidden
Uploading file myfile.txt to file object file-xxxx
File "myfile.txt" was uploaded successfully. Closing...

file-xxxx
$ dx ls
$  dx ls -a # the -a flag shows hidden files
myfile.txt

Specification

Output

Upon successful completion, the file IDs of the newly created remote files are printed to standard output (each on a new line). If a particular file upload was unsuccessful, then the string "Failed" is printed instead of the file ID. The lines are printed in same order as the files specified on command line for upload.

Errors

In case an error occurs, Upload Agent does not exit immediately. Instead, all other files are still uploaded and the program exits with a non-zero status code, printing "Failed" instead of the file ID of the failed upload(s).

Non-Zero Error Code

The program exits with a non-zero error code if any of the following errors occur:

  • A valid authentication token was not provided.

  • A connection to the API server could not be made.

  • A file to be uploaded does not exist or is not accessible.

  • If --do-not-resume is not set and the user tries to upload the same file to a project more than once.

  • An unknown command line option or illegal value for an option is provided.

  • The project is not specified; the specified project does not exist; or the authentication token provided does not permit CONTRIBUTE access to the specified project.

  • The project specifier cannot be unambiguously resolved, e.g. if two or more projects match the given project name.

  • A folder or file object could not be created.

  • A file could not be closed (i.e., the /file-xxxx/close API call failed).

  • An error occurs while compressing a chunk, e.g., the machine ran out of memory.

File Not Fully Uploaded

A file may not be fully uploaded if any of the following errors occur:

  • There is an ambiguity in resolving a resume target of a local file and --do-not-resume is not set, e.g. if a local file has been uploaded more than once to a project (partially or fully) and it cannot unambiguously determined which remote file upload should be resumed.

  • A chunk could not be uploaded in the specified number of tries.

  • A file could not be closed because one of the chunks was compressed below the 5MB limit. In this case, you should try uploading the failed file with either the --do-not-compress option, or by setting a larger --chunk-size.

Last updated

Copyright 2024 DNAnexus