Files

A file object can be used to store an opaque array of bytes, which is what is traditionally known as a "file". File objects contain binary data, and are immutable. After a file has been uploaded, its contents cannot be modified.

Lifecycle

File objects follow a three-state lifecycle that determines what actions you can perform:

  • Open - When you create a new file using the new method, it starts empty in the "open" state. You can upload file content in multiple parts using the upload method.

  • Closing - When you call close, the file moves to the "closing" state while the system finalizes it. During this time, you cannot upload to or download from the file. This process can take seconds to minutes depending on file size.

  • Closed - Once finalization completes, the file enters the "closed" state. The file becomes available for download using the download method, but its contents cannot be modified.

The system considers files in "open" or "closing" states inactive for 24 hours as abandoned. The system sends a notification at 24 hours and deletes abandoned files after a few days.

Uploading

Given the size of genomic datasets, transferring a large file over a single HTTP call is impractical. The DNAnexus Platform supports uploading files in multiple, smaller parts, which enables robust, resumable, and parallel uploads. To encourage efficient uploads, the system restricts part sizes to 5MB-5GB.

The upload call takes specific arguments that indicate which part is to be uploaded and other information specific to that part. The server returns a preauthenticated upload URL specific to that file object and part index, along with specific headers that the client must provide with the subsequent HTTP PUT. The user can then upload the part to that URL by doing an HTTP PUT with the content of the part (such as when using "curl -T -X PUT"), along with the headers returned to the client, without providing any other special authentication headers. Users are allowed to upload the same part multiple times (by performing both an upload and matching PUT for a part more than once). Only the last successful PUT is considered canonical.

The close call performs finalization by concatenating the parts. Once the file is closed, the part distinction is removed and the original file becomes available for download using the download call. Parts are concatenated in order of ascending part index. Indices do not need to be consecutive.

Closing a file object is only possible after all parts have been uploaded, that is, when for every index supplied in any upload call, the user has successfully performed a PUT to the respective URL received. Closing does not complete until all parts have been successfully uploaded. If the user does not complete a part upload for any file part previously created through an upload call, the close call succeeds but the file remains in "closing" state until marked as abandoned and deleted. Therefore, ensure all parts have been successfully uploaded before calling close.

The closing process takes a few seconds to minutes, depending on the file size. Larger files may require more time.

The user can query the status of the file object by using the describe call.

Limits on Parts

The fileUploadParameters field in the /project-xxxx/describe output specifies file upload limits for the project or container:

  • Parts have a maximum size, in bytes

  • Parts may have a minimum size, in bytes

  • The completed file has a maximum size, in bytes

  • A maximum number of parts may be uploaded

  • A minimum number of parts may be required

See the documentation of /project-xxxx/describe for further details about how to interpret it. The client should call this route before beginning the upload to get the appropriate limits and break the file into appropriately sized chunks.

For reference, the default parameters (for projects whose region begins with aws:) are the following:

  • maximumPartSize: 5368709120 (5 GiB)

  • minimumPartSize: 5242880 (5 MiB)

  • maximumFileSize: 5497558138880 (5 TiB)

  • maximumNumParts: 10000

  • emptyLastPartAllowed: true

Downloading

The download call returns a preauthenticated URL which can be used to download the file via a simple HTTP GET. The service behind that URL supports the "Range" header of the HTTP standard, allowing for any byte range to be downloaded, and enabling compatibility with download accelerators that fetch multiple ranges in parallel to increase throughput.

Removal From a Project

Removing an unclosed file object from the project triggers:

  • Invalidation of all upload and download URLs

  • Closure or failure (with a 500 code) of any existing URL connections

A file increases byte usage by its size. Byte usage is counted on upload completion.

File API Method Specifications

API method: /file/new

Specification

Creates a new file object in the "open" state. Optionally specify an Internet Media Type to associate with the file. DNAnexus provides this in the "Content-Type:" HTTP header for download requests, allowing web browsers to identify file types and handle downloaded files appropriately. All values are accepted without further validation (and sent back as-is in the "Content-Type:" header when a file is downloaded), so long as they contain only characters in the ASCII range 33-126. If the "media" field is not provided, or is set to "", the system attempts to auto-detect the Internet Media Type.

Inputs

  • project string ID of the project or container to which the record should belong, such as the string "project-xxxx"

  • name string (optional, default is the new ID) The name of the object

  • tags array of strings (optional) Tags to associate with the object

  • types array of strings (optional) Types to associate with the object

  • hidden boolean (optional, default false) Whether the object should be hidden

  • properties mapping (optional) Properties to associate with the object

    • key Property name

    • value string Property value

  • details mapping or array (optional, default { }) JSON object or array that is to be associated with the object. See the Object Details section for details on valid input

  • folder string (optional, default "/") Full path of the folder that is to contain the new object

  • parents boolean (optional, default false) Whether all folders in the path provided in folder should be created if they do not exist

  • media string (optional, default "") The Internet Media Type (formerly known as MIME type or Content-type) of the file

  • nonce string (optional) Unique identifier for this request. Ensures that even if multiple requests fail and are retried, only a single file is created. For more information, see Nonces.

Outputs

  • id string ID of the created file object, for example, a string in the form "file-xxxx"

Errors

  • InvalidInput

    • A reserved linking string ("$dnanexus_link") appears as a key in a hash in details but is not the only key in the hash

    • A reserved linking string ("$dnanexus_link") appears as the only key in a hash in details but has value other than a string

    • The key "media" (if provided) contains at least one character outside of the ASCII range 33-126)

    • For each property key-value pair, the size, encoded in UTF-8, of the property key may not exceed 100 bytes and the property value may not exceed 700 bytes

    • A nonce was reused in a request but other inputs had changed signifying a new and different request

    • A nonce may not exceed 128 bytes

  • PermissionDenied

    • UPLOAD access required

    • File creation restricted to job context in externalUploadRestricted project

    • Project's defaultSymlink drive is not accessible to perform this action

    • Action failed because CreateMultiPartUpload is not available for this drive

  • InvalidType

    • project is not a project ID

  • ResourceNotFound

    • The specified project is not found

    • The route in folder does not exist, and parents is false

API method: /file-xxxx/upload

Specification

Informs the system that a file part (identified by a particular index) needs to be uploaded, and retrieves a "part upload URL" (specific to this part) for performing the upload of that part. This method needs to be called at least once during the file object lifecycle. Once this method is called for a particular index, then data for that part must be provided to the corresponding part upload URL before calling the "close" method.

The part upload URL returned by this method may refer to a different endpoint than the DNAnexus API server, and accepts HTTP PUT requests supplying the binary data for the file part. Any PUT request to the part upload URL must be initiated shortly after its generation, otherwise a new URL for the part must be generated with another call to upload. The PUT request MUST include all HTTP headers that are specified in the API server's response to upload (see below). A "Content-Type" header should not be supplied, since the Internet Media Type is not set separately for each part.

The part upload URL has support for CORS with the following configuration:

  • SSL is required (from an origin served over https)

  • Part uploads must use the HTTP PUT method

  • Allowed HTTP headers

    • content-length

    • origin

    • content-md5

    • accept

    • content-type

    • x-amz-server-side-encryption

A successful request to a part upload URL receives an HTTP response with a 2xx response code and blank response body. An unsuccessful upload receives an HTTP response with an error response code.

This method may be called multiple times with the same index parameter. The system maintains a state for each part, which can be either "pending" or "complete". The first time this method is called, the state of the respective part is set to "pending". If the request completes successfully, and in the meantime no other request has been made to that part upload URL, then the state is set to "complete". However, users are allowed to make multiple upload requests to the same part index multiple times (to reupload a piece). Subsequent upload requests reset the state back to "pending". If multiple overlapping requests are made to that part URL, the last successful request is considered the canonical one, and the part becomes pending or complete based on the fate of that last request.

All parts, except the part with the highest index, have a minimum size given by the fileUploadParameters.minimumPartSize field of the /project-xxxx/describe output. If the fileUploadParameters.emptyLastPartAllowed field of the /project-xxxx/describe has the value false, then the last part must contain at least 1 byte.

All parts have a maximum size given by the fileUploadParameters.maximumPartSize field of the /project-xxxx/describe output.

Inputs

  • size int The size in bytes of this file part

  • md5 string Hex encoding of the file part's MD5 message-digest

  • index int (optional, default 1) Number that determines the relative ordering of parts during the concatenation process that occurs in close. This must be at least 1, and at most the value fileUploadParameters.maximumNumParts returned by /project-xxxx/describe.

Outputs

  • url string A URL (of the https scheme) to which data may be sent via HTTP PUT

  • expires timestamp Time at which url expires, typically a few minutes after generation

  • headers mapping HTTP headers which must be supplied with any PUT request to url

    • key Header field name

    • value string Header value

    • These headers may contain authentication tokens. For security, do not store, log, print, or share them in any insecure way in production environments.

Errors

  • PermissionDenied

    • UPLOAD access required

    • File upload restricted to job context in externalUploadRestricted project

  • InvalidInput

    • size must be a non-negative integer, no greater than fileUploadParameters.maximumPartSize

    • If fileUploadParameters.emptyLastPartAllowed is false, size must be at least min(fileUploadParameters.minimumPartSize, 1)

    • md5 must be a hex string of the appropriate length

    • index (if provided) must a positive integer, no greater than fileUploadParameters.maximumNumParts

  • InvalidState

    • The file object is not in the open state

API method: /file-xxxx/describe

Specification

Describes a file object (see also /record-xxxx/describe). Returns, among others, the Internet Media Type of the file as well as the state of the file object. If the file object is in the "closed" state, the file size is reported as well. If the "parts" key in input map is "true", or the file object is in the "open" state, the response contains a "parts" key, whose value is a map describing the status of the parts that the system knows about. More specifically, for every part that the system has been informed via an "upload" call, the "parts" map contains a key corresponding to the part index (represented as a string), whose value is a map with the part status. This includes the state, size, and md5 of the part. The state can be either "pending" or "complete".

Alternatively, you can use the /system/describeDataObjects method to describe many data objects at once.

As mentioned in the description of the "upload" call, a part enters the "pending" state for any of the following reasons:

  • A PUT to its part upload URL has not been successfully completed.

  • An earlier PUT to its part upload URL has been successfully completed, but the request initiated last is either ongoing or failed.

A part enters the "complete" state after a successful PUT to its part upload URL. For completed parts, the "size" field shows the amount of data received and the "md5" field contains the MD5 hash of the received data. For parts in "pending" state, both fields are set to null.

A project ID can be provided as a hint to request user-provided metadata from a particular project. If the specified project does not contain the object and another project is found containing it where the user has VIEW permissions, that other project is used to return the metadata. The response includes the project ID used to return the user-provided metadata, whether it matches the provided hint or not. Details can be requested via this method, but remain hidden if the requestor lacks VIEW access.

Third-party data providers can apply watermarks to files. A watermarked file's content depends on:

  • The file id

  • The watermarkId and watermarkVersion associated with the file in a specific project

  • Updates to the watermark version by the data provider, which alter the watermarked file content

Inputs

  • project string (optional) Project or container ID to be used as a hint for finding the object in an accessible project. This field should be provided to get consistent output for watermarked files.

  • defaultFields boolean (optional, default false if fields is supplied, true otherwise) whether to include the default set of fields in the output (the default fields are described in the "Outputs" section below). The selections are overridden by any fields explicitly named in fields.

  • fields mapping (optional) include or exclude the specified fields from the output. These selections override the settings in defaultFields.

    • key Desired output field. See the "Outputs" section below for valid values here

    • value boolean whether to include the field

The following options are deprecated (and are ignored when fields is present):

  • parts boolean (optional, default true if file is in the "open" state and false otherwise) Whether additional information for each part should be returned

  • properties boolean (optional, default false) Whether the properties should be returned

  • details boolean (optional, default false) Whether the details should also be returned

Outputs

  • id string The object ID, such as "file-xxxx"

The following fields are included by default (but can be disabled using fields or defaultFields):

  • project string ID of the project or container in which the object was found

  • class string The value "file"

  • types array of strings Types associated with the object

  • created timestamp Time at which this object was created

  • state string The value "open", "closing", or "closed"

  • hidden boolean Whether the object is hidden or not

  • links array of strings The object IDs that are pointed to from this object

  • name string The name of the object

  • folder string The full path to the folder containing the object

  • sponsored boolean Whether the object is sponsored by DNAnexus

  • tags array of strings Tags associated with the object

  • modified timestamp Time at which the user-provided metadata of the object was last modified

  • media string The Internet Media Type of the file

  • archivalState string The archival state of the file

  • createdBy mapping How the object was created

    • user string ID of the user who created the object or launched an execution which created the object

    • job string present if a job created the object ID of the job that created the object

    • executable string present if a job created the object ID of the app or applet that the job was running

  • drive string The drive ID that the file is located in

  • symlinkPath mapping Remote path of the symlink

    • container string The container name. For AWS S3 this is region:bucket and for Azure Blob this is containerName.

    • object string The remote path of the symlink

  • md5 string Hex encoding of the whole file part's MD5 message-digest. This field applies only to readable symlink files.

The following field is included by default if the file is open:

  • parts mapping Information on the file parts that have been or are being uploaded

    • key Part index that has been provided to any /file-xxxx/upload calls on the file so far

    • value mapping Information on the file part with key/values:

      • state string Either "pending" or "complete"

      • size int or null The size of the part (in bytes) if state is "complete". Null otherwise

      • md5 string or null The hexadecimal encoded value of MD5 message-digest (as defined in RFC 1321) of the data if state is "complete". Null otherwise

The following field (included by default) is only available if the object is in the "closed" state:

  • size int Size of the file in bytes

The following field (included by default) is available if the object is sponsored by a third party:

  • sponsoredUntil timestamp Indicates the expiration time of data sponsorship (this field appears only for sponsored objects and specifies a future expiration time)

The following fields are only returned if the corresponding field in the fields input is set to true:

  • properties mapping Properties associated with the object

    • key Property name

    • value string Property value

  • details mapping or array Contents of the object's details

  • watermarkId string ID of the watermark applied to the file's content during download

  • watermarkVersion string version of the watermark's content applied to the file's content during download

  • resolvedPolicies mapping A mapping of policies that affect file-xxxx within the scope of a single project. You must specify project in the input to receive consistent results. Data providers can update policies at any time. Fields in this mapping include:

    • isExternalDownloadable boolean True if file-xxxx can be downloaded, false otherwise.

Errors

  • ResourceNotFound

    • project, if specified, does not exist

  • PermissionDenied

    • VIEW access required to some project that contains the file object

    • If project is specified, VIEW access is required to that project

API method: /file-xxxx/close

Specification

Initiates finalization of the file object, if it is not already in the "closed" state.

To close a file object, there must be at least one part, and all parts must be in the "complete" state. If this call is successful, it returns immediately and the file object advances to the "closing" state. The system concatenates the parts in order of increasing part index (and those indices do not have to be consecutive). After completion, the file object advances to the "closed" state. For a more detailed discussion refer to the section "Uploading".

All parts, except the part with the highest index, have a minimum size given by the fileUploadParameters.minimumPartSize field of the /project-xxxx/describe output.

The part with the highest index must contain at least one byte if fileUploadParameters.emptyLastPartAllowed is false.

The total file size cannot exceed the size given by the fileUploadParameters.maximumFileSize field of the /project-xxxx/describe output.

If fileUploadParameters.emptyLastPartAllowed is true, there must be at least one part.

A call to this method on a closed file succeeds with a detail field set as shown in "Outputs" below.

Inputs

None

Outputs

  • id string ID of the manipulated object, such as "file-xxxx"

If the object is in the closed state:

  • detail string An explanatory message

Errors

  • PermissionDenied

    • UPLOAD access required

    • File closing restricted to job context in externalUploadRestricted project

  • InvalidState

    • fileUploadParameters.emptyLastPartAllowed is true and there are zero parts

    • At least one part is in the "pending" state

    • There exists a part, other than the one with the highest part index, whose size is less than fileUploadParameters.minimumPartSize bytes

    • fileUploadParameters.emptyLastPartAllowed is false and the part with the highest index has 0 bytes

    • The file has size larger than fileUploadParameters.maximumFileSize bytes

API method: /file-xxxx/download

Specification

Generates a "download URL" for downloading the contents of this file object. The download URL may refer to a different endpoint than the DNAnexus API server, and accepts HTTP GET requests.

Requests to the download URL must be initiated within the number of seconds specified in the "duration" input parameter (starting from the time this call is made, according to the server). After this duration, the URL expires. GET requests MUST include any headers specified in the API server's response to /file-xxxx/download (see below). The download URL also honors the HTTP Range request headers, enabling clients to download only a particular byte range of the file.

Include project context in paths using formats like project-xxxx:file-yyyy or project-xxxx:/path/to/file.txt.

The download URL implements CORS support:

  • GET requests with an "Origin" header receive a matching "Access-Control-Allow-Origin" response header

  • OPTIONS preflight requests are accepted with "Access-Control-Request-Method: GET"

  • Preflight responses include:

    • "Access-Control-Allow-Origin": matches request Origin

    • "Access-Control-Allow-Headers": matches request Access-Control-Request-Headers

    • "Access-Control-Max-Age": 1 hour

Successful calls to the download URL receive the HTTP response code 200, and include a "Content-Type" header, set to whatever Internet Media Type was specified when the file object was created, and a "Content-Disposition: attachment" header that may also include a filename, if requested (see below). The request may include the query string "?inline" to override the Content-Disposition header. Unsuccessful requests receive an HTTP error response code (and in that case there are no guarantees about the response body, as the download URL does not necessarily conform to the general API rules regarding error messages).

Inputs

  • duration int (optional, default is 3600 seconds (1 hour)) Number of seconds (starting from the time this call is made, according to the server) during which the generated URL is valid. The maximum allowed duration is specified by the maximumPreauthenticatedDuration org policy. Setting duration to 0 is equal to using the maximumPreauthenticatedDuration value.

    • Setting duration below a minimum threshold, typically at least 300 seconds (5 minutes), may cause dependent functionality to break. For example, File Viewers and some automated tools may require URLs to be valid for 3-5 minutes to complete downloads or viewing sessions. Ensure the duration is sufficient for all intended use cases.

  • filename string (optional) The desired filename of the downloaded file, to be affixed to the returned URL. If provided, this filename is encoded as a URI component and affixed to the download URL, whose resource part ends in, for example, '/filename', to ease downloads through web browsers and utilities such as wget.

  • project string (optional) ID of a project containing the file, with which the download URL is associated. Requests to the download URL succeed only when the file resides in this project and the user who generated the URL has at least VIEW permission to this project. If this value is not provided, the URL remains valid as long as the file resides in any project where the user who generated the URL has at least VIEW permission. This field is required to get the download URL for a watermarked file when invoked outside the context of a DNAnexus job.

  • preauthenticated boolean (optional, default false) Whether to generate a "preauthenticated" download URL, which embeds any necessary authentication information in the URL itself, rather than requiring separate request headers

    • Preauthenticated URLs grant access to file data to anyone who has the link. To protect sensitive information, avoid storing, logging, printing, or sharing these URLs in insecure ways, especially in production environments.

    • For improved security, always generate preauthenticated URLs that are specific to a project.

  • stickyIP boolean (optional if preauthenticated is true, required to be false otherwise, default false) Whether HTTP GET requests to the preauthenticated download URL should be restricted to a single origin IP address. If stickyIP and preauthenticated are true, the IP address of the first HTTP GET request to the preauthenticated download URL becomes the only allowed origin for subsequent requests.

Outputs

  • url string An absolute URL for downloading the file via HTTP GET requests.

  • headers mapping HTTP headers which MUST be supplied with any GET request to the url

    • key Header field name

    • value string Header value

    • The headers may contain authentication tokens. For security, do not store, log, print, or share them in any insecure way in production environments.

    • For preauthenticated URL requests, the headers contain no keys.

Errors

  • ResourceNotFound

    • project is specified but the file object is not in the specified project

  • PermissionDenied

    • VIEW access required to some project that contains the file object

    • If project is specified, VIEW access is required to that project

  • InvalidInput

    • duration (if provided) is not a positive integer

  • InvalidState

    • The file object is not in the "closed" state

File downloads in web applications

To generate non-preauthenticated file download URLs, web applications (running inside web browsers) should make /file-xxxx/download requests to the separate endpoint https://dl.dnanex.us instead of https://api.dnanexus.com. Browser requests to non-preauthenticated file download URLs are authenticated by a URL-specific cookie, set by the API server's response to the /file-xxxx/download route on this separate endpoint.

Non-browser-based applications implementing the above specification, or web applications only needing preauthenticated download URLs, may call /file-xxxx/download on https://api.dnanexus.com.

Last updated

Was this helpful?