Files
A file object can be used to store an opaque array of bytes, which is what is traditionally known as a "file". File objects contain binary data, and are immutable. After a file has been uploaded, its contents cannot be modified.
Lifecycle
File objects follow a three-state lifecycle that determines what actions you can perform:
Closing - When you call close, the file moves to the "closing" state while the system finalizes it. During this time, you cannot upload to or download from the file. This process can take seconds to minutes depending on file size.
Closed - Once finalization completes, the file enters the "closed" state. The file becomes available for download using the download method, but its contents cannot be modified.
The system considers files in "open" or "closing" states inactive for 24 hours as abandoned. The system sends a notification at 24 hours and deletes abandoned files after a few days.

Uploading
Given the size of genomic datasets, transferring a large file over a single HTTP call is impractical. The DNAnexus Platform supports uploading files in multiple, smaller parts, which enables robust, resumable, and parallel uploads. To encourage efficient uploads, the system restricts part sizes to 5MB-5GB.
The upload call takes specific arguments that indicate which part is to be uploaded and other information specific to that part. The server returns a preauthenticated upload URL specific to that file object and part index, along with specific headers that the client must provide with the subsequent HTTP PUT. The user can then upload the part to that URL by doing an HTTP PUT with the content of the part (such as when using "curl -T -X PUT"), along with the headers returned to the client, without providing any other special authentication headers. Users are allowed to upload the same part multiple times (by performing both an upload and matching PUT for a part more than once). Only the last successful PUT is considered canonical.
The close call performs finalization by concatenating the parts. Once the file is closed, the part distinction is removed and the original file becomes available for download using the download call. Parts are concatenated in order of ascending part index. Indices do not need to be consecutive.
Closing a file object is only possible after all parts have been uploaded, that is, when for every index supplied in any upload call, the user has successfully performed a PUT to the respective URL received. Closing does not complete until all parts have been successfully uploaded. If the user does not complete a part upload for any file part previously created through an upload call, the close call succeeds but the file remains in "closing" state until marked as abandoned and deleted. Therefore, ensure all parts have been successfully uploaded before calling close.
The closing process takes a few seconds to minutes, depending on the file size. Larger files may require more time.
The user can query the status of the file object by using the describe call.
Limits on Parts
The fileUploadParameters field in the /project-xxxx/describe output specifies file upload limits for the project or container:
Parts have a maximum size, in bytes
Parts may have a minimum size, in bytes
The completed file has a maximum size, in bytes
A maximum number of parts may be uploaded
A minimum number of parts may be required
See the documentation of /project-xxxx/describe for further details about how to interpret it. The client should call this route before beginning the upload to get the appropriate limits and break the file into appropriately sized chunks.
For reference, the default parameters (for projects whose region begins with aws:) are the following:
maximumPartSize: 5368709120 (5 GiB)minimumPartSize: 5242880 (5 MiB)maximumFileSize: 5497558138880 (5 TiB)maximumNumParts: 10000emptyLastPartAllowed: true
Downloading
The download call returns a preauthenticated URL which can be used to download the file via a simple HTTP GET. The service behind that URL supports the "Range" header of the HTTP standard, allowing for any byte range to be downloaded, and enabling compatibility with download accelerators that fetch multiple ranges in parallel to increase throughput.
Removal From a Project
Removing an unclosed file object from the project triggers:
Invalidation of all upload and download URLs
Closure or failure (with a 500 code) of any existing URL connections
File API Method Specifications
API method: /file/new
/file/newSpecification
Creates a new file object in the "open" state. Optionally specify an Internet Media Type to associate with the file. DNAnexus provides this in the "Content-Type:" HTTP header for download requests, allowing web browsers to identify file types and handle downloaded files appropriately. All values are accepted without further validation (and sent back as-is in the "Content-Type:" header when a file is downloaded), so long as they contain only characters in the ASCII range 33-126. If the "media" field is not provided, or is set to "", the system attempts to auto-detect the Internet Media Type.
Inputs
projectstring ID of the project or container to which the record should belong, such as the string "project-xxxx"namestring (optional, default is the new ID) The name of the objecttagsarray of strings (optional) Tags to associate with the objecttypesarray of strings (optional) Types to associate with the objecthiddenboolean (optional, default false) Whether the object should be hiddenpropertiesmapping (optional) Properties to associate with the objectkey Property name
value string Property value
detailsmapping or array (optional, default { }) JSON object or array that is to be associated with the object. See the Object Details section for details on valid inputfolderstring (optional, default "/") Full path of the folder that is to contain the new objectparentsboolean (optional, default false) Whether all folders in the path provided infoldershould be created if they do not existmediastring (optional, default "") The Internet Media Type (formerly known as MIME type or Content-type) of the filenoncestring (optional) Unique identifier for this request. Ensures that even if multiple requests fail and are retried, only a single file is created. For more information, see Nonces.
Outputs
idstring ID of the created file object, for example, a string in the form "file-xxxx"
Errors
InvalidInput
A reserved linking string ("$dnanexus_link") appears as a key in a hash in
detailsbut is not the only key in the hashA reserved linking string ("$dnanexus_link") appears as the only key in a hash in
detailsbut has value other than a stringThe key "media" (if provided) contains at least one character outside of the ASCII range 33-126)
For each property key-value pair, the size, encoded in UTF-8, of the property key may not exceed 100 bytes and the property value may not exceed 700 bytes
A
noncewas reused in a request but other inputs had changed signifying a new and different requestA
noncemay not exceed 128 bytes
PermissionDenied
UPLOAD access required
File creation restricted to job context in
externalUploadRestrictedprojectProject's
defaultSymlinkdrive is not accessible to perform this actionAction failed because
CreateMultiPartUploadis not available for this drive
InvalidType
projectis not a project ID
ResourceNotFound
The specified project is not found
The route in
folderdoes not exist, andparentsis false
API method: /file-xxxx/upload
/file-xxxx/uploadSpecification
Informs the system that a file part (identified by a particular index) needs to be uploaded, and retrieves a "part upload URL" (specific to this part) for performing the upload of that part. This method needs to be called at least once during the file object lifecycle. Once this method is called for a particular index, then data for that part must be provided to the corresponding part upload URL before calling the "close" method.
The part upload URL returned by this method may refer to a different endpoint than the DNAnexus API server, and accepts HTTP PUT requests supplying the binary data for the file part. Any PUT request to the part upload URL must be initiated shortly after its generation, otherwise a new URL for the part must be generated with another call to upload. The PUT request MUST include all HTTP headers that are specified in the API server's response to upload (see below). A "Content-Type" header should not be supplied, since the Internet Media Type is not set separately for each part.
The part upload URL has support for CORS with the following configuration:
SSL is required (from an origin served over https)
Part uploads must use the HTTP PUT method
Allowed HTTP headers
content-length
origin
content-md5
accept
content-type
x-amz-server-side-encryption
A successful request to a part upload URL receives an HTTP response with a 2xx response code and blank response body. An unsuccessful upload receives an HTTP response with an error response code.
This method may be called multiple times with the same index parameter. The system maintains a state for each part, which can be either "pending" or "complete". The first time this method is called, the state of the respective part is set to "pending". If the request completes successfully, and in the meantime no other request has been made to that part upload URL, then the state is set to "complete". However, users are allowed to make multiple upload requests to the same part index multiple times (to reupload a piece). Subsequent upload requests reset the state back to "pending". If multiple overlapping requests are made to that part URL, the last successful request is considered the canonical one, and the part becomes pending or complete based on the fate of that last request.
All parts, except the part with the highest index, have a minimum size given by the fileUploadParameters.minimumPartSize field of the /project-xxxx/describe output. If the fileUploadParameters.emptyLastPartAllowed field of the /project-xxxx/describe has the value false, then the last part must contain at least 1 byte.
All parts have a maximum size given by the fileUploadParameters.maximumPartSize field of the /project-xxxx/describe output.
Inputs
sizeint The size in bytes of this file partmd5string Hex encoding of the file part's MD5 message-digestindexint (optional, default 1) Number that determines the relative ordering of parts during the concatenation process that occurs in close. This must be at least 1, and at most the valuefileUploadParameters.maximumNumPartsreturned by /project-xxxx/describe.
Outputs
urlstring A URL (of the https scheme) to which data may be sent via HTTP PUTexpirestimestamp Time at whichurlexpires, typically a few minutes after generationheadersmapping HTTP headers which must be supplied with any PUT request tourlkey Header field name
value string Header value
These headers may contain authentication tokens. For security, do not store, log, print, or share them in any insecure way in production environments.
Errors
PermissionDenied
UPLOAD access required
File upload restricted to job context in
externalUploadRestrictedproject
InvalidInput
sizemust be a non-negative integer, no greater thanfileUploadParameters.maximumPartSizeIf
fileUploadParameters.emptyLastPartAllowedisfalse,sizemust be at leastmin(fileUploadParameters.minimumPartSize, 1)md5must be a hex string of the appropriate lengthindex(if provided) must a positive integer, no greater thanfileUploadParameters.maximumNumParts
InvalidState
The file object is not in the
openstate
API method: /file-xxxx/describe
/file-xxxx/describeSpecification
Describes a file object (see also /record-xxxx/describe). Returns, among others, the Internet Media Type of the file as well as the state of the file object. If the file object is in the "closed" state, the file size is reported as well. If the "parts" key in input map is "true", or the file object is in the "open" state, the response contains a "parts" key, whose value is a map describing the status of the parts that the system knows about. More specifically, for every part that the system has been informed via an "upload" call, the "parts" map contains a key corresponding to the part index (represented as a string), whose value is a map with the part status. This includes the state, size, and md5 of the part. The state can be either "pending" or "complete".
Alternatively, you can use the /system/describeDataObjects method to describe many data objects at once.
As mentioned in the description of the "upload" call, a part enters the "pending" state for any of the following reasons:
A PUT to its part upload URL has not been successfully completed.
An earlier PUT to its part upload URL has been successfully completed, but the request initiated last is either ongoing or failed.
A part enters the "complete" state after a successful PUT to its part upload URL. For completed parts, the "size" field shows the amount of data received and the "md5" field contains the MD5 hash of the received data. For parts in "pending" state, both fields are set to null.
A project ID can be provided as a hint to request user-provided metadata from a particular project. If the specified project does not contain the object and another project is found containing it where the user has VIEW permissions, that other project is used to return the metadata. The response includes the project ID used to return the user-provided metadata, whether it matches the provided hint or not. Details can be requested via this method, but remain hidden if the requestor lacks VIEW access.
Third-party data providers can apply watermarks to files. A watermarked file's content depends on:
The file
idThe
watermarkIdandwatermarkVersionassociated with the file in a specific projectUpdates to the watermark version by the data provider, which alter the watermarked file content
Inputs
projectstring (optional) Project or container ID to be used as a hint for finding the object in an accessible project. This field should be provided to get consistent output for watermarked files.defaultFieldsboolean (optional, default false iffieldsis supplied, true otherwise) whether to include the default set of fields in the output (the default fields are described in the "Outputs" section below). The selections are overridden by any fields explicitly named infields.fieldsmapping (optional) include or exclude the specified fields from the output. These selections override the settings indefaultFields.key Desired output field. See the "Outputs" section below for valid values here
value boolean whether to include the field
The following options are deprecated (and are ignored when fields is present):
partsboolean (optional, default true if file is in the "open" state and false otherwise) Whether additional information for each part should be returnedpropertiesboolean (optional, default false) Whether the properties should be returneddetailsboolean (optional, default false) Whether the details should also be returned
Outputs
idstring The object ID, such as "file-xxxx"
The following fields are included by default (but can be disabled using fields or defaultFields):
projectstring ID of the project or container in which the object was foundclassstring The value "file"typesarray of strings Types associated with the objectcreatedtimestamp Time at which this object was createdstatestring The value "open", "closing", or "closed"hiddenboolean Whether the object is hidden or notlinksarray of strings The object IDs that are pointed to from this objectnamestring The name of the objectfolderstring The full path to the folder containing the objectsponsoredboolean Whether the object is sponsored by DNAnexustagsarray of strings Tags associated with the objectmodifiedtimestamp Time at which the user-provided metadata of the object was last modifiedmediastring The Internet Media Type of the filearchivalStatestring The archival state of the filecreatedBymapping How the object was createduserstring ID of the user who created the object or launched an execution which created the objectjobstring present if a job created the object ID of the job that created the objectexecutablestring present if a job created the object ID of the app or applet that the job was running
drivestring The drive ID that the file is located insymlinkPathmapping Remote path of the symlinkcontainerstring The container name. For AWS S3 this isregion:bucketand for Azure Blob this iscontainerName.objectstring The remote path of the symlink
md5string Hex encoding of the whole file part's MD5 message-digest. This field applies only to readable symlink files.
The following field is included by default if the file is open:
partsmapping Information on the file parts that have been or are being uploadedkey Part index that has been provided to any /file-xxxx/upload calls on the file so far
value mapping Information on the file part with key/values:
statestring Either "pending" or "complete"sizeint or null The size of the part (in bytes) ifstateis "complete". Null otherwisemd5string or null The hexadecimal encoded value of MD5 message-digest (as defined in RFC 1321) of the data ifstateis "complete". Null otherwise
The following field (included by default) is only available if the object is in the "closed" state:
sizeint Size of the file in bytes
The following field (included by default) is available if the object is sponsored by a third party:
sponsoredUntiltimestamp Indicates the expiration time of data sponsorship (this field appears only for sponsored objects and specifies a future expiration time)
The following fields are only returned if the corresponding field in the fields input is set to true:
propertiesmapping Properties associated with the objectkey Property name
value string Property value
detailsmapping or array Contents of the object's detailswatermarkIdstring ID of the watermark applied to the file's content during downloadwatermarkVersionstring version of the watermark's content applied to the file's content during downloadresolvedPoliciesmapping A mapping of policies that affectfile-xxxxwithin the scope of a single project. You must specifyprojectin the input to receive consistent results. Data providers can update policies at any time. Fields in this mapping include:isExternalDownloadableboolean True iffile-xxxxcan be downloaded, false otherwise.
Errors
ResourceNotFound
project, if specified, does not exist
PermissionDenied
VIEW access required to some project that contains the file object
If
projectis specified, VIEW access is required to that project
API method: /file-xxxx/close
/file-xxxx/closeSpecification
Initiates finalization of the file object, if it is not already in the "closed" state.
To close a file object, there must be at least one part, and all parts must be in the "complete" state. If this call is successful, it returns immediately and the file object advances to the "closing" state. The system concatenates the parts in order of increasing part index (and those indices do not have to be consecutive). After completion, the file object advances to the "closed" state. For a more detailed discussion refer to the section "Uploading".
All parts, except the part with the highest index, have a minimum size given by the fileUploadParameters.minimumPartSize field of the /project-xxxx/describe output.
The part with the highest index must contain at least one byte if fileUploadParameters.emptyLastPartAllowed is false.
The total file size cannot exceed the size given by the fileUploadParameters.maximumFileSize field of the /project-xxxx/describe output.
If fileUploadParameters.emptyLastPartAllowed is true, there must be at least one part.
A call to this method on a closed file succeeds with a detail field set as shown in "Outputs" below.
Inputs
None
Outputs
idstring ID of the manipulated object, such as "file-xxxx"
If the object is in the closed state:
detailstring An explanatory message
Errors
PermissionDenied
UPLOAD access required
File closing restricted to job context in
externalUploadRestrictedproject
InvalidState
fileUploadParameters.emptyLastPartAllowedistrueand there are zero partsAt least one part is in the "pending" state
There exists a part, other than the one with the highest part index, whose size is less than
fileUploadParameters.minimumPartSizebytesfileUploadParameters.emptyLastPartAllowedisfalseand the part with the highest index has 0 bytesThe file has size larger than
fileUploadParameters.maximumFileSizebytes
API method: /file-xxxx/download
/file-xxxx/downloadSpecification
Generates a "download URL" for downloading the contents of this file object. The download URL may refer to a different endpoint than the DNAnexus API server, and accepts HTTP GET requests.
Requests to the download URL must be initiated within the number of seconds specified in the "duration" input parameter (starting from the time this call is made, according to the server). After this duration, the URL expires. GET requests MUST include any headers specified in the API server's response to /file-xxxx/download (see below). The download URL also honors the HTTP Range request headers, enabling clients to download only a particular byte range of the file.
Include project context in paths using formats like project-xxxx:file-yyyy or project-xxxx:/path/to/file.txt.
The download URL implements CORS support:
GET requests with an "Origin" header receive a matching "Access-Control-Allow-Origin" response header
OPTIONS preflight requests are accepted with "Access-Control-Request-Method: GET"
Preflight responses include:
"Access-Control-Allow-Origin": matches request Origin
"Access-Control-Allow-Headers": matches request Access-Control-Request-Headers
"Access-Control-Max-Age": 1 hour
Successful calls to the download URL receive the HTTP response code 200, and include a "Content-Type" header, set to whatever Internet Media Type was specified when the file object was created, and a "Content-Disposition: attachment" header that may also include a filename, if requested (see below). The request may include the query string "?inline" to override the Content-Disposition header. Unsuccessful requests receive an HTTP error response code (and in that case there are no guarantees about the response body, as the download URL does not necessarily conform to the general API rules regarding error messages).
Inputs
durationint (optional, default is 3600 seconds (1 hour)) Number of seconds (starting from the time this call is made, according to the server) during which the generated URL is valid. The maximum allowed duration is specified by themaximumPreauthenticatedDurationorg policy. Settingdurationto 0 is equal to using themaximumPreauthenticatedDurationvalue.Setting
durationbelow a minimum threshold, typically at least 300 seconds (5 minutes), may cause dependent functionality to break. For example, File Viewers and some automated tools may require URLs to be valid for 3-5 minutes to complete downloads or viewing sessions. Ensure the duration is sufficient for all intended use cases.
filenamestring (optional) The desired filename of the downloaded file, to be affixed to the returned URL. If provided, this filename is encoded as a URI component and affixed to the download URL, whose resource part ends in, for example, '/filename', to ease downloads through web browsers and utilities such aswget.projectstring (optional) ID of a project containing the file, with which the download URL is associated. Requests to the download URL succeed only when the file resides in this project and the user who generated the URL has at least VIEW permission to this project. If this value is not provided, the URL remains valid as long as the file resides in any project where the user who generated the URL has at least VIEW permission. This field is required to get the download URL for a watermarked file when invoked outside the context of a DNAnexus job.preauthenticatedboolean (optional, default false) Whether to generate a "preauthenticated" download URL, which embeds any necessary authentication information in the URL itself, rather than requiring separate request headersPreauthenticated URLs grant access to file data to anyone who has the link. To protect sensitive information, avoid storing, logging, printing, or sharing these URLs in insecure ways, especially in production environments.
For improved security, always generate preauthenticated URLs that are specific to a project.
stickyIPboolean (optional ifpreauthenticatedis true, required to be false otherwise, default false) Whether HTTP GET requests to the preauthenticated download URL should be restricted to a single origin IP address. IfstickyIPandpreauthenticatedare true, the IP address of the first HTTP GET request to the preauthenticated download URL becomes the only allowed origin for subsequent requests.
Outputs
urlstring An absolute URL for downloading the file via HTTP GET requests.headersmapping HTTP headers which MUST be supplied with any GET request to theurlkey Header field name
value string Header value
The headers may contain authentication tokens. For security, do not store, log, print, or share them in any insecure way in production environments.
For preauthenticated URL requests, the headers contain no keys.
Errors
ResourceNotFound
projectis specified but the file object is not in the specified project
PermissionDenied
VIEW access required to some project that contains the file object
If
projectis specified, VIEW access is required to that project
InvalidInput
duration(if provided) is not a positive integer
InvalidState
The file object is not in the "closed" state
File downloads in web applications
To generate non-preauthenticated file download URLs, web applications (running inside web browsers) should make /file-xxxx/download requests to the separate endpoint https://dl.dnanex.us instead of https://api.dnanexus.com. Browser requests to non-preauthenticated file download URLs are authenticated by a URL-specific cookie, set by the API server's response to the /file-xxxx/download route on this separate endpoint.
Non-browser-based applications implementing the above specification, or web applications only needing preauthenticated download URLs, may call /file-xxxx/download on https://api.dnanexus.com.
Last updated
Was this helpful?