Files
A file object can be used to store an opaque array of bytes, which is what is traditionally known as a "file". File objects contain binary data, and are immutable. After a file has been uploaded, its contents cannot be modified.
Lifecycle
File objects follow a three-state lifecycle that determines what actions you can perform:
Closing - When you call close, the file moves to the "closing" state while the system finalizes it. During this time, you cannot upload to or download from the file. This process can take seconds to minutes depending on file size.
Closed - Once finalization completes, the file enters the "closed" state. The file becomes available for download using the download method, but its contents cannot be modified.
The system considers files in "open" or "closing" states inactive for 24 hours as abandoned. The system sends a notification at 24 hours and deletes abandoned files after a few days.

Uploading
Given the size of genomic datasets, transferring a large file over a single HTTP call is impractical. The DNAnexus Platform supports uploading files in multiple, smaller parts, which enables robust, resumable, and parallel uploads. To encourage efficient uploads, the system restricts part sizes to 5MB-5GB.
The upload call takes specific arguments that indicate which part is to be uploaded and other information specific to that part. The server returns a preauthenticated upload URL specific to that file object and part index, along with specific headers that the client must provide with the subsequent HTTP PUT. The user can then upload the part to that URL by doing an HTTP PUT with the content of the part (such as when using "curl -T -X PUT"), along with the headers returned to the client, without providing any other special authentication headers. Users are allowed to upload the same part multiple times (by performing both an upload and matching PUT for a part more than once). Only the last successful PUT is considered canonical.
The close call performs finalization by concatenating the parts. Once the file is closed, the part distinction is removed and the original file becomes available for download using the download call. Parts are concatenated in order of ascending part index. Indices do not need to be consecutive.
Closing a file object is only possible after all parts have been uploaded, that is, when for every index supplied in any upload call, the user has successfully performed a PUT to the respective URL received. Closing does not complete until all parts have been successfully uploaded. If the user does not complete a part upload for any file part previously created through an upload call, the close call succeeds but the file remains in "closing" state until marked as abandoned and deleted. Therefore, ensure all parts have been successfully uploaded before calling close.
The closing process takes a few seconds to minutes, depending on the file size. Larger files may require more time.
The user can query the status of the file object by using the describe call.
Limits on Parts
The fileUploadParameters
field in the /project-xxxx/describe output specifies file upload limits for the project or container:
Parts have a maximum size, in bytes
Parts may have a minimum size, in bytes
The completed file has a maximum size, in bytes
A maximum number of parts may be uploaded
A minimum number of parts may be required
See the documentation of /project-xxxx/describe for further details about how to interpret it. The client should call this route before beginning the upload to get the appropriate limits and break the file into appropriately sized chunks.
For reference, the default parameters (for projects whose region
begins with aws:
) are the following:
maximumPartSize
: 5368709120 (5 GiB)minimumPartSize
: 5242880 (5 MiB)maximumFileSize
: 5497558138880 (5 TiB)maximumNumParts
: 10000emptyLastPartAllowed
: true
Downloading
The download call returns a preauthenticated URL which can be used to download the file via a simple HTTP GET. The service behind that URL supports the "Range" header of the HTTP standard, allowing for any byte range to be downloaded, and enabling compatibility with download accelerators that fetch multiple ranges in parallel to increase throughput.
Removal From a Project
Removing an unclosed file object from the project triggers:
Invalidation of all upload and download URLs
Closure or failure (with a 500 code) of any existing URL connections
File API Method Specifications
API method: /file/new
/file/new
Specification
Creates a new file object in the "open" state. Optionally specify an Internet Media Type to associate with the file. DNAnexus provides this in the "Content-Type:" HTTP header for download requests, allowing web browsers to identify file types and handle downloaded files appropriately. All values are accepted without further validation (and sent back as-is in the "Content-Type:" header when a file is downloaded), so long as they contain only characters in the ASCII range 33-126. If the "media" field is not provided, or is set to "", the system attempts to auto-detect the Internet Media Type.
Inputs
project
string ID of the project or container to which the record should belong, such as the string "project-xxxx"name
string (optional, default is the new ID) The name of the objecttags
array of strings (optional) Tags to associate with the objecttypes
array of strings (optional) Types to associate with the objecthidden
boolean (optional, default false) Whether the object should be hiddenproperties
mapping (optional) Properties to associate with the objectkey Property name
value string Property value
details
mapping or array (optional, default { }) JSON object or array that is to be associated with the object. See the Object Details section for details on valid inputfolder
string (optional, default "/") Full path of the folder that is to contain the new objectparents
boolean (optional, default false) Whether all folders in the path provided infolder
should be created if they do not existmedia
string (optional, default "") The Internet Media Type (formerly known as MIME type or Content-type) of the filenonce
string (optional) Unique identifier for this request. Ensures that even if multiple requests fail and are retried, only a single file is created. For more information, see Nonces.
Outputs
id
string ID of the created file object, for example, a string in the form "file-xxxx"
Errors
InvalidInput
A reserved linking string ("$dnanexus_link") appears as a key in a hash in
details
but is not the only key in the hashA reserved linking string ("$dnanexus_link") appears as the only key in a hash in
details
but has value other than a stringThe key "media" (if provided) contains at least one character outside of the ASCII range 33-126)
For each property key-value pair, the size, encoded in UTF-8, of the property key may not exceed 100 bytes and the property value may not exceed 700 bytes
A
nonce
was reused in a request but other inputs had changed signifying a new and different requestA
nonce
may not exceed 128 bytes
PermissionDenied
UPLOAD access required
File creation restricted to job context in
externalUploadRestricted
projectProject's
defaultSymlink
drive is not accessible to perform this actionAction failed because
CreateMultiPartUpload
is not available for this drive
InvalidType
project
is not a project ID
ResourceNotFound
The specified project is not found
The route in
folder
does not exist, andparents
is false
API method: /file-xxxx/upload
/file-xxxx/upload
Specification
Informs the system that a file part (identified by a particular index) needs to be uploaded, and retrieves a "part upload URL" (specific to this part) for performing the upload of that part. This method needs to be called at least once during the file object lifecycle. Once this method is called for a particular index, then data for that part must be provided to the corresponding part upload URL before calling the "close" method.
The part upload URL returned by this method may refer to a different endpoint than the DNAnexus API server, and accepts HTTP PUT requests supplying the binary data for the file part. Any PUT request to the part upload URL must be initiated shortly after its generation, otherwise a new URL for the part must be generated with another call to upload. The PUT request MUST include all HTTP headers that are specified in the API server's response to upload (see below). A "Content-Type" header should not be supplied, since the Internet Media Type is not set separately for each part.
The part upload URL has support for CORS with the following configuration:
SSL is required (from an origin served over https)
Part uploads must use the HTTP PUT method
Allowed HTTP headers
content-length
origin
content-md5
accept
content-type
x-amz-server-side-encryption
A successful request to a part upload URL receives an HTTP response with a 2xx response code and blank response body. An unsuccessful upload receives an HTTP response with an error response code.
This method may be called multiple times with the same index parameter. The system maintains a state for each part, which can be either "pending" or "complete". The first time this method is called, the state of the respective part is set to "pending". If the request completes successfully, and in the meantime no other request has been made to that part upload URL, then the state is set to "complete". However, users are allowed to make multiple upload requests to the same part index multiple times (to reupload a piece). Subsequent upload requests reset the state back to "pending". If multiple overlapping requests are made to that part URL, the last successful request is considered the canonical one, and the part becomes pending or complete based on the fate of that last request.
All parts, except the part with the highest index, have a minimum size given by the fileUploadParameters.minimumPartSize
field of the /project-xxxx/describe output. If the fileUploadParameters.emptyLastPartAllowed
field of the /project-xxxx/describe has the value false
, then the last part must contain at least 1 byte.
All parts have a maximum size given by the fileUploadParameters.maximumPartSize
field of the /project-xxxx/describe output.
Inputs
size
int The size in bytes of this file partmd5
string Hex encoding of the file part's MD5 message-digestindex
int (optional, default 1) Number that determines the relative ordering of parts during the concatenation process that occurs in close. This must be at least 1, and at most the valuefileUploadParameters.maximumNumParts
returned by /project-xxxx/describe.
Outputs
url
string A URL (of the https scheme) to which data may be sent via HTTP PUTexpires
timestamp Time at whichurl
expires, typically a few minutes after generationheaders
mapping HTTP headers which must be supplied with any PUT request tourl
key Header field name
value string Header value
These headers may contain authentication tokens. For security, do not store, log, print, or share them in any insecure way in production environments.
Errors
PermissionDenied
UPLOAD access required
File upload restricted to job context in
externalUploadRestricted
project
InvalidInput
size
must be a non-negative integer, no greater thanfileUploadParameters.maximumPartSize
If
fileUploadParameters.emptyLastPartAllowed
isfalse
,size
must be at leastmin(fileUploadParameters.minimumPartSize, 1)
md5
must be a hex string of the appropriate lengthindex
(if provided) must a positive integer, no greater thanfileUploadParameters.maximumNumParts
InvalidState
The file object is not in the
open
state
API method: /file-xxxx/describe
/file-xxxx/describe
Specification
Describes a file object (see also /record-xxxx/describe). Returns, among others, the Internet Media Type of the file as well as the state of the file object. If the file object is in the "closed" state, the file size is reported as well. If the "parts" key in input map is "true", or the file object is in the "open" state, the response contains a "parts" key, whose value is a map describing the status of the parts that the system knows about. More specifically, for every part that the system has been informed via an "upload" call, the "parts" map contains a key corresponding to the part index (represented as a string), whose value is a map with the part status. This includes the state, size, and md5 of the part. The state can be either "pending" or "complete".
Alternatively, you can use the /system/describeDataObjects method to describe many data objects at once.
As mentioned in the description of the "upload" call, a part enters the "pending" state for any of the following reasons:
A PUT to its part upload URL has not been successfully completed.
An earlier PUT to its part upload URL has been successfully completed, but the request initiated last is either ongoing or failed.
A part enters the "complete" state after a successful PUT to its part upload URL. For completed parts, the "size" field shows the amount of data received and the "md5" field contains the MD5 hash of the received data. For parts in "pending" state, both fields are set to null.
A project ID can be provided as a hint to request user-provided metadata from a particular project. If the specified project does not contain the object and another project is found containing it where the user has VIEW permissions, that other project is used to return the metadata. The response includes the project ID used to return the user-provided metadata, whether it matches the provided hint or not. Details can be requested via this method, but remain hidden if the requestor lacks VIEW access.
Third-party data providers can apply watermarks to files. A watermarked file's content depends on:
The file
id
The
watermarkId
andwatermarkVersion
associated with the file in a specific projectUpdates to the watermark version by the data provider, which alter the watermarked file content
Inputs
project
string (optional) Project or container ID to be used as a hint for finding the object in an accessible project. This field should be provided to get consistent output for watermarked files.defaultFields
boolean (optional, default false iffields
is supplied, true otherwise) whether to include the default set of fields in the output (the default fields are described in the "Outputs" section below). The selections are overridden by any fields explicitly named infields
.fields
mapping (optional) include or exclude the specified fields from the output. These selections override the settings indefaultFields
.key Desired output field. See the "Outputs" section below for valid values here
value boolean whether to include the field
The following options are deprecated (and are ignored when fields
is present):
parts
boolean (optional, default true if file is in the "open" state and false otherwise) Whether additional information for each part should be returnedproperties
boolean (optional, default false) Whether the properties should be returneddetails
boolean (optional, default false) Whether the details should also be returned
Outputs
id
string The object ID, such as "file-xxxx"
The following fields are included by default (but can be disabled using fields
or defaultFields
):
project
string ID of the project or container in which the object was foundclass
string The value "file"types
array of strings Types associated with the objectcreated
timestamp Time at which this object was createdstate
string The value "open", "closing", or "closed"hidden
boolean Whether the object is hidden or notlinks
array of strings The object IDs that are pointed to from this objectname
string The name of the objectfolder
string The full path to the folder containing the objectsponsored
boolean Whether the object is sponsored by DNAnexustags
array of strings Tags associated with the objectmodified
timestamp Time at which the user-provided metadata of the object was last modifiedmedia
string The Internet Media Type of the filearchivalState
string The archival state of the filecreatedBy
mapping How the object was createduser
string ID of the user who created the object or launched an execution which created the objectjob
string present if a job created the object ID of the job that created the objectexecutable
string present if a job created the object ID of the app or applet that the job was running
drive
string The drive ID that the file is located insymlinkPath
mapping Remote path of the symlinkcontainer
string The container name. For AWS S3 this isregion:bucket
and for Azure Blob this iscontainerName
.object
string The remote path of the symlink
md5
string Hex encoding of the whole file part's MD5 message-digest. This field applies only to readable symlink files.
The following field is included by default if the file is open:
parts
mapping Information on the file parts that have been or are being uploadedkey Part index that has been provided to any /file-xxxx/upload calls on the file so far
value mapping Information on the file part with key/values:
state
string Either "pending" or "complete"size
int or null The size of the part (in bytes) ifstate
is "complete". Null otherwisemd5
string or null The hexadecimal encoded value of MD5 message-digest (as defined in RFC 1321) of the data ifstate
is "complete". Null otherwise
The following field (included by default) is only available if the object is in the "closed" state:
size
int Size of the file in bytes
The following field (included by default) is available if the object is sponsored by a third party:
sponsoredUntil
timestamp Indicates the expiration time of data sponsorship (this field appears only for sponsored objects and specifies a future expiration time)
The following fields are only returned if the corresponding field in the fields
input is set to true
:
properties
mapping Properties associated with the objectkey Property name
value string Property value
details
mapping or array Contents of the object's detailswatermarkId
string ID of the watermark applied to the file's content during downloadwatermarkVersion
string version of the watermark's content applied to the file's content during downloadresolvedPolicies
mapping A mapping of policies that affectfile-xxxx
within the scope of a single project. You must specifyproject
in the input to receive consistent results. Data providers can update policies at any time. Fields in this mapping include:isExternalDownloadable
boolean True iffile-xxxx
can be downloaded, false otherwise.
Errors
ResourceNotFound
project
, if specified, does not exist
PermissionDenied
VIEW access required to some project that contains the file object
If
project
is specified, VIEW access is required to that project
API method: /file-xxxx/close
/file-xxxx/close
Specification
Initiates finalization of the file object, if it is not already in the "closed" state.
To close a file object, there must be at least one part, and all parts must be in the "complete" state. If this call is successful, it returns immediately and the file object advances to the "closing" state. The system concatenates the parts in order of increasing part index (and those indices do not have to be consecutive). After completion, the file object advances to the "closed" state. For a more detailed discussion refer to the section "Uploading".
All parts, except the part with the highest index, have a minimum size given by the fileUploadParameters.minimumPartSize
field of the /project-xxxx/describe output.
The part with the highest index must contain at least one byte if fileUploadParameters.emptyLastPartAllowed
is false
.
The total file size cannot exceed the size given by the fileUploadParameters.maximumFileSize
field of the /project-xxxx/describe output.
If fileUploadParameters.emptyLastPartAllowed
is true
, there must be at least one part.
A call to this method on a closed file succeeds with a detail
field set as shown in "Outputs" below.
Inputs
None
Outputs
id
string ID of the manipulated object, such as "file-xxxx"
If the object is in the closed
state:
detail
string An explanatory message
Errors
PermissionDenied
UPLOAD access required
File closing restricted to job context in
externalUploadRestricted
project
InvalidState
fileUploadParameters.emptyLastPartAllowed
istrue
and there are zero partsAt least one part is in the "pending" state
There exists a part, other than the one with the highest part index, whose size is less than
fileUploadParameters.minimumPartSize
bytesfileUploadParameters.emptyLastPartAllowed
isfalse
and the part with the highest index has 0 bytesThe file has size larger than
fileUploadParameters.maximumFileSize
bytes
API method: /file-xxxx/download
/file-xxxx/download
Specification
Generates a "download URL" for downloading the contents of this file object. The download URL may refer to a different endpoint than the DNAnexus API server, and accepts HTTP GET requests.
Requests to the download URL must be initiated within the number of seconds specified in the "duration" input parameter (starting from the time this call is made, according to the server). After this duration, the URL expires. GET requests MUST include any headers specified in the API server's response to /file-xxxx/download
(see below). The download URL also honors the HTTP Range
request headers, enabling clients to download only a particular byte range of the file.
Include project context in paths using formats like project-xxxx:file-yyyy
or project-xxxx:/path/to/file.txt
.
The download URL implements CORS support:
GET requests with an "Origin" header receive a matching "Access-Control-Allow-Origin" response header
OPTIONS preflight requests are accepted with "Access-Control-Request-Method: GET"
Preflight responses include:
"Access-Control-Allow-Origin": matches request Origin
"Access-Control-Allow-Headers": matches request Access-Control-Request-Headers
"Access-Control-Max-Age": 1 hour
Successful calls to the download URL receive the HTTP response code 200, and include a "Content-Type" header, set to whatever Internet Media Type was specified when the file object was created, and a "Content-Disposition: attachment" header that may also include a filename, if requested (see below). The request may include the query string "?inline" to override the Content-Disposition header. Unsuccessful requests receive an HTTP error response code (and in that case there are no guarantees about the response body, as the download URL does not necessarily conform to the general API rules regarding error messages).
Inputs
duration
int (optional, default is 3600 seconds (1 hour)) Number of seconds (starting from the time this call is made, according to the server) during which the generated URL is valid. The maximum allowed duration is specified by themaximumPreauthenticatedDuration
org policy. Settingduration
to 0 is equal to using themaximumPreauthenticatedDuration
value.Setting
duration
below a minimum threshold, typically at least 300 seconds (5 minutes), may cause dependent functionality to break. For example, File Viewers and some automated tools may require URLs to be valid for 3-5 minutes to complete downloads or viewing sessions. Ensure the duration is sufficient for all intended use cases.
filename
string (optional) The desired filename of the downloaded file, to be affixed to the returned URL. If provided, this filename is encoded as a URI component and affixed to the download URL, whose resource part ends in, for example, '/filename', to ease downloads through web browsers and utilities such aswget
.project
string (optional) ID of a project containing the file, with which the download URL is associated. Requests to the download URL succeed only when the file resides in this project and the user who generated the URL has at least VIEW permission to this project. If this value is not provided, the URL remains valid as long as the file resides in any project where the user who generated the URL has at least VIEW permission. This field is required to get the download URL for a watermarked file when invoked outside the context of a DNAnexus job.preauthenticated
boolean (optional, default false) Whether to generate a "preauthenticated" download URL, which embeds any necessary authentication information in the URL itself, rather than requiring separate request headersPreauthenticated URLs grant access to file data to anyone who has the link. To protect sensitive information, avoid storing, logging, printing, or sharing these URLs in insecure ways, especially in production environments.
For improved security, always generate preauthenticated URLs that are specific to a project.
stickyIP
boolean (optional ifpreauthenticated
is true, required to be false otherwise, default false) Whether HTTP GET requests to the preauthenticated download URL should be restricted to a single origin IP address. IfstickyIP
andpreauthenticated
are true, the IP address of the first HTTP GET request to the preauthenticated download URL becomes the only allowed origin for subsequent requests.
Outputs
url
string An absolute URL for downloading the file via HTTP GET requests.headers
mapping HTTP headers which MUST be supplied with any GET request to theurl
key Header field name
value string Header value
The headers may contain authentication tokens. For security, do not store, log, print, or share them in any insecure way in production environments.
For preauthenticated URL requests, the headers contain no keys.
Errors
ResourceNotFound
project
is specified but the file object is not in the specified project
PermissionDenied
VIEW access required to some project that contains the file object
If
project
is specified, VIEW access is required to that project
InvalidInput
duration
(if provided) is not a positive integer
InvalidState
The file object is not in the "closed" state
File downloads in web applications
To generate non-preauthenticated file download URLs, web applications (running inside web browsers) should make /file-xxxx/download
requests to the separate endpoint https://dl.dnanex.us
instead of https://api.dnanexus.com
. Browser requests to non-preauthenticated file download URLs are authenticated by a URL-specific cookie, set by the API server's response to the /file-xxxx/download
route on this separate endpoint.
Non-browser-based applications implementing the above specification, or web applications only needing preauthenticated download URLs, may call /file-xxxx/download
on https://api.dnanexus.com
.
Last updated
Was this helpful?