Upload Agent
Introduction
The DNAnexus Upload Agent is a fast and convenient command-line client that can be used to upload files to DNAnexus. For uploading multiple or large files, Upload Agent is particularly recommended due to its ability to resume previously interrupted uploads.
Installing Upload Agent: Follow the instructions on the Upload Agent download page to download the Upload Agent executable.
NOTE: For the rest of this document, we will use
ua
to represent the Upload Agent executable, but you should replace it with the path to where you have saved the Upload Agent executable on your local file system.
Basic Usage
Synopsis
Usage
In the examples below, we assume that you have appropriately set your environment variables. Specifically, we assume that the authentication token and current workspace (project) are set. Remember that you can always override these environment variables by using the --auth-token
and --project
command-line options.
To see the current environment variables being used by ua
, run:
Running a Simple Diagnostic Test
Running the Upload Agent with the --test
flag will run a simple test to verify that ua
is correctly configured. The output of a successful configuration will look similar to the output below. Upload Agent will print any errors as part of the output.
Uploading a Single File
You can upload a single file using the Upload Agent. In the following example, we upload a local file named my_file.txt
to the project called "my_project".
Now, if the project's files are viewed from the web UI or via the command line using the dx ls
command, there is a new file with the name my-file.txt.gz
. Upload Agent automatically compressed the file my-file.txt
and appended the .gz
extension during upload.
Uploading Multiple Files to the Same Project
Upload Agent can upload multiple files to the same project. In the following example, we upload two local files, my_file_1.txt
and my_file_2.txt
, to the project "my_project".
File IDs output in the same command-line input order. In the above example, the first and second lines correspond to the new file IDs generated by uploading my_file_1.txt
and my_file_2.txt
, respectively.
Uploading Directories
Upload Agent can upload all the files in a given directory. The destination of the files depends on the directory name given as input. If the name contains a trailing /
, the Upload Agent doesn’t create the directory, it just copies the contents of the folder to the destination path in the platform.
Without a trailing /
, a new remote directory will be created and the files will be uploaded to the new directory (dir_name
).
NOTE: You can upload at most 1000 files in a single operation.
Uploading Directories Recursively
Upload Agent can upload a directory recursively using the --recursive
flag. The destination directory follows the same rules as above; that is, with a trailing /
, ua
will assume that the destination directory exists, and without the trailing /
, Upload Agent will create a new directory, if the directory doesn’t exist.
Help String
The number of compress threads is optimized and would depend on the system used.
Uploading Data from stdin
Upload Agent can upload data from stdin directly into a file. Note that when this option is used, only one file can be created. This option is very useful to pipe output from a program and upload it as a file.
This command will read data interactively from the terminal until the stream is terminated with <CTRL>+D
, which represents the end of the file (EOF).
Redirecting Uploaded Files
Redirecting to a Folder
You can change the final path of the file in the project via the flags --folder
and --name
. The following command uploads my_file_1.txt
into the folder called oldData and behaves as if the file had been called file_1
(the new file name is file_1.gz
, after compression).
Turning Off Automatic Compression
By default, Upload Agent compresses all the previously uncompressed file(s) before uploading and appends .gz
to the end of the file's name. You can override this behavior with the --do-not-compress
flag.
Preventing the Resumption of Previous Uploads
By default, Upload Agent attempts to resume all the uploads it can. In the case, where you would like to upload the same file twice, you can override this behavior with the --do-not-resume
flag.
In the situation where the Upload Agent fails to upload a file, or has partially uploaded a file, we can resume the upload by specifying the same command again. When resuming an upload, we generate a file signature using the following information:
size
modifiedTimestamp
toCompress (boolean whether the file was uploaded original with --do-not-compress)
chunkSize
the canonical path to the file
This information is summarized as a metadata field on the file object. When you upload a file using Upload Agent, it will quickly calculate this file signature and search your current project for any file with the same signature. If it finds such an object, and if the file upload is incomplete, it will try to resume the upload. If the file upload is complete, then the file signature is added as a property.
Waiting for a File to Close
When scripting, the ua
command can wait until uploaded files are in the closed state before proceeding to the next command by using the --wait-on-close
flag. You do not have to wait for a file to be closed to give it as input to app or applet, as the platform will automatically wait for the file to be closed before starting the job. However, if you would like to copy a file between projects, then you must wait for it to be in the closed state.
Monitoring Upload Progress
You can turn on progress reporting (printed to stderr) with the --progress
flag.
Uploading Files With Metadata
Details
Assigning File Details
Upload Agent can set details for a file using the --details
flag. The details must be passed as a valid JSON string. For more information about JSON, see the Wikipedia page on JSON.
Assigning Details to Multiple Files
The following command will set the same details to all the files being uploaded.
Assigning Different Details to Multiple Files
You need to provide one --details
flag per file uploaded.
Properties
Upload Agent can assign properties to a file during upload using the --property flag.
Assigning a Property to a Single File
Assigning Multiple Properties to a Single File
Advanced Usage
Changing the Number of Threads
You can specify a different number of threads for compression and a different number of outgoing HTTPS connections that will be opened to upload the file chunks by using the flags --compress-threads
and --upload-threads
, respectively. The number of threads used to read the input files can be changed by the --read-threads
flag.
For example, if you are uploading some files from a eight-core machine, we recommend that you limit the usage to 75% of the machine's capabilities as a safety measure and evenly divide the usage amongst the three options. So, the number of cores for reading the input data (--read-threads
), compressing (--compress-threads
) and uploading (--upload-threads
) the files would be two each. The command would look something like this:
Using a Different Chunk Size
You can change the chunk size that is uploaded at a time in each thread using the flag --chunk-size
. This parameter is dependent on the memory available on the machine. We highly recommend that you keep the default value. However, if your network connection is particularly slow, we suggest that you use a smaller chunk size.
The following command splits up large-file.txt into chunks of size 200MB (209,715,200 bytes) each to be uploaded. By default, the chunk size is ~95MB (100,000,000 bytes). We have a maximum limit of 10,000 chunks.
Setting Files as Hidden
By default, Upload Agent sets all files as visible. You can override this behavior with the --visibility
flag.
Specification
Output
Upon successful completion, the file IDs of the newly created remote files are printed to standard output (each on a new line). If a particular file upload was unsuccessful, then the string "Failed" is printed instead of the file ID. The lines are printed in same order as the files specified on command line for upload.
Errors
In case an error occurs, Upload Agent does not exit immediately. Instead, all other files are still uploaded and the program exits with a non-zero status code, printing "Failed" instead of the file ID of the failed upload(s).
Non-Zero Error Code
The program exits with a non-zero error code if any of the following errors occur:
A valid authentication token was not provided.
A connection to the API server could not be made.
A file to be uploaded does not exist or is not accessible.
If
--do-not-resume
is not set and the user tries to upload the same file to a project more than once.An unknown command line option or illegal value for an option is provided.
The project is not specified; the specified project does not exist; or the authentication token provided does not permit CONTRIBUTE access to the specified project.
The project specifier cannot be unambiguously resolved, e.g. if two or more projects match the given project name.
A folder or file object could not be created.
A file could not be closed (i.e., the /file-xxxx/close API call failed).
An error occurs while compressing a chunk, e.g., the machine ran out of memory.
File Not Fully Uploaded
A file may not be fully uploaded if any of the following errors occur:
There is an ambiguity in resolving a resume target of a local file and
--do-not-resume
is not set, e.g. if a local file has been uploaded more than once to a project (partially or fully) and it cannot unambiguously determined which remote file upload should be resumed.A chunk could not be uploaded in the specified number of tries.
A file could not be closed because one of the chunks was compressed below the 5MB limit. In this case, you should try uploading the failed file with either the
--do-not-compress
option, or by setting a larger--chunk-size
.
Last updated