Omics Data Catalog

Learn about the Omics Data Catalog API for metadata management, search, and synchronization of structured research data.

circle-info

A license is required to use the Omics Data Catalog on the DNAnexus Platform. Contact DNAnexus Salesenvelope for more information.

The Omics Data Catalog API provides programmatic access to metadata management, search, and synchronization capabilities for structured research data. These APIs complement the standard DNAnexus Platform APIs with specialized endpoints for Omics Data Catalog operations.

Use /system/findDataCatalogs to discover data catalogs available to you.

Data Type Reference

Omics Data Catalog supports specific data types that control which values are accepted, how values are stored and sent through the API, and how they can be used for filtering and searching. Understanding these data types is essential when working with schema definitions, upserting records, and constructing search filters.

For conceptual information about the data catalog data types, see Supported Data Types.

dataType
Description
Examples

String

max length 255 characters, any characters

"tissue", "mus musculus (mouse)", "RiboFree Total RNA Library Kit"

LongString

max length 10,000 characters, any characters, values longer than 255 characters are truncated when requesting data

Long protocol descriptions, detailed study summaries

ID

min length 1, max length 40 characters, any characters

"Iv3-78", "sample-id-234536", "A1B2C3D4E5F6G7H8I9J0K1L"

Integer

min -9223372036854775808, max 9223372036854775807, must be passed as string to ensure precision

"2", "4712384", "-840000090000"

Decimal

Maximum precision of 20 significant digits, must be passed as string to ensure precision

"-7198.8", "0.0000000012", "5.34E-2", "-0.1e4"

Date

Must be a valid date in YYYY-MM-DD format with values between "0001-01-01" and "9999-12-31"

"2024-01-01", "1999-12-12"

DateTime

RFC 3339arrow-up-right format, year 0001 or later

"2020-01-01T00:00:00+02:00", "2020-01-01T00:00:00.123Z"

Null Value Handling

Most fields can accept null values when making API calls to /dataCatalog-xxxx/upsertRecords. Pass null as a JSON null value (not the string "null").

Fields that cannot accept null values:

  • System-generated metadata fields, such as created_at, file_name, size

  • Primary ID fields as defined in primaryIdField for the entity

  • Required fields where isRequiredInIngestion is true

Field-specific null behavior:

  • For fields with allowedValues defined: When isRequiredInIngestion is not defined or false, null is implicitly allowed even though it's not included in the allowedValues array.

  • For optional fields without allowedValues: null is accepted and can be used to clear a previously set value.

Omics Data Catalog API Method Specifications

API Method: /dataCatalog-xxxx/describe

Specification

Gets descriptive information about a specific data catalog.

This method uses the standard DNAnexus Platform API base URL (for example, https://api.dnanexus.com) rather than the data catalog-specific URL.

Inputs

  • fields mapping (optional) Restrict the output of this method to have only the provided keys in this field. If not provided, all fields are returned by default.

    • key — Desired output field. See the Outputs section below for valid values.

    • value boolean — The value true.

Outputs

All fields are included by default when fields is not provided. The following fields can be individually disabled using fields:

  • id string ID of the data catalog.

  • billTo string ID of the organization to which any costs associated with this data catalog are billed.

  • region string The region this data catalog is in, such as aws:us-east-1.

  • name string The name of the data catalog (typically organization name + region).

  • url string Server URL for the data catalog endpoint.

  • members array of strings IDs of organizations that have been invited to access this data catalog. Only visible to administrators of the billTo organization.

Errors

  • ResourceNotFound

    • The specified data catalog does not exist.

  • InvalidInput

    • Input is not a hash, or fields if present, is not a hash or has a non-boolean key.

Additional standard API errors may be returned.

API Method: /dataCatalog-xxxx/invite

Specification

Invites a DNAnexus organization to the data catalog. The invited organization gains access to view, search, update, and delete metadata according to project permissions. If the organization already has access to the data catalog, no change is made.

This method uses the standard DNAnexus Platform API base URL (for example, https://api.dnanexus.com) rather than the data catalog-specific URL.

Inputs

  • invitee string (required) The organization to receive access to the data catalog. Must be an org ID.

Outputs

  • id string (nullable) Invite ID, or null if the invite did not need to be created. This happens when the invitee already has access to the data catalog.

  • state string State of the invite. Always "accepted" because invitations take effect immediately.

Errors

  • ResourceNotFound

    • invitee is not a valid organization ID or is not an existing DNAnexus org.

  • PermissionDenied

    • Must be an administrator of the billTo organization with a full scope token.

Additional standard API errors may be returned.

API Method: /dataCatalog-xxxx/leave

Specification

Removes an organization's access to the specified data catalog. The billTo organization cannot leave its own data catalog.

This method uses the standard DNAnexus Platform API base URL (for example, https://api.dnanexus.com) rather than the data catalog-specific URL.

Inputs

  • organization string (required) Organization ID. Removes the organization from the data catalog, revoking all access the organization has to the data catalog.

Outputs

  • id string ID of the data catalog from which the organization was removed.

Errors

  • InvalidInput

    • The billTo organization may not leave its own data catalog.

  • ResourceNotFound

    • The specified data catalog does not exist.

  • PermissionDenied

    • A full scope token is required.

    • The requesting user must be an administrator of the organization being removed or an administrator of the billTo organization.

Additional standard API errors may be returned.

API Method: /dataCatalog-xxxx/downloadLoaderTemplates

Specification

Returns zipped CSV files that can be used as templates when creating input for metadata ingestion with the Data Catalog Loader app. The content of these CSV files depends on the schema defined for the data catalog.

This API method uses the data catalog URL returned by the /system/findDataCatalogs API method as the base URL.

Inputs

None.

Outputs

Returns a ZIP file containing CSV template files. Each CSV file in the ZIP corresponds to an entity in the schema and contains column headers for all fields of that entity, providing a template for data ingestion.

The HTTP response includes the following HTTP headers:

  • Content-Type: application/zip

  • Content-Disposition: attachment; filename="<file name>.zip"

Errors

This method may return standard API errors.

API Method: /dataCatalog-xxxx/downloadRecords

Specification

Downloads metadata records from the data catalog as a CSV file based on search criteria. The requesting user must have at least VIEW access to the projects containing the metadata. For public entities (where isPublicInDataCatalog is true in the schema), all records are returned regardless of the requesting user's project access. Projects with the downloadRestricted flag are excluded from the results.

This API method uses the data catalog URL returned by the /system/findDataCatalogs API method as the base URL.

circle-info

Limitations

  • The CSV contains up to 250 records.

  • LongString values above 255 characters are truncated.

Inputs

  • entity string (required) The ID of the entity to search.

  • filters mapping (optional) Specifies the filter criteria that matching records must satisfy. Keys are field IDs obtained from the schema, values are the filter conditions. Can be provided in the following ways:

    • A string to match a field value exactly, for example, {"/sample/tissue_type": "blood"}.

    • An OR condition requiring the field to match any of the provided values, for example, {"/sample/status": {"$or": ["active", "processed"]}}.

    • A partial match condition for String, LongString, and ID data types, for example, {"/participant/name": {"$partialMatch": "john"}}.

    • An OR partial match condition requiring partial match against any provided value, for example, {"/analysis/tool": {"$orPartialMatch": ["bwa", "bowtie"]}}.

    • An AND partial match condition requiring partial match against all provided values, for example, {"/sample/description": {"$andPartialMatch": ["tumor", "primary"]}}.

    • Range conditions for Integer, Decimal, Date, and DateTime data types, for example:

      • Inclusive range with both bounds: {"/participant/age": {"$from": "18", "$to": "65"}}.

      • Exclusive range with both bounds: {"/data/file_size": {"$fromExclusive": "1000", "$toExclusive": "10000"}}.

      • Range with only lower bound: {"/participant/age": {"$from": "18"}}.

      • Range with mixed bounds: {"/data/file_size": {"$fromExclusive": "1000", "$to": "10000"}}.

Outputs

Returns a CSV file containing the search results with entity fields as columns.

The HTTP response includes the following HTTP headers:

  • Content-Type: text/csv

  • Content-Disposition: attachment; filename="<file name>.csv"

Errors

  • InvalidInput

    • The filters parameter contains invalid field IDs or malformed filter conditions.

  • ResourceNotFound

    • The entity does not exist in the schema.

Additional standard API errors may be returned.

API Method: /dataCatalog-xxxx/downloadSchema

Specification

Exports the schema following the Data Model Loader's Data Dictionary file format.

This API method uses the data catalog URL returned by the /system/findDataCatalogs API method as the base URL.

Inputs

None.

Outputs

Returns a CSV file containing the schema definition in Data Dictionary format. The CSV file contains the following columns: entity, name, type, display_name, is_system_managed, required_in_ingestion, referenced_entity, description.

The HTTP response includes the following HTTP headers:

  • Content-Type: text/csv

  • Content-Disposition: attachment; filename="<file name>.csv"

Errors

This method may return standard API errors.

API Method: /dataCatalog-xxxx/findRecords

Specification

Searches for records in the data catalog that match specified criteria. The requesting user must have at least VIEW access to the projects containing the metadata. For public entities (where isPublicInDataCatalog is true in the schema), all records are returned regardless of the requesting user's project access.

This API method uses the data catalog URL returned by the /system/findDataCatalogs API method as the base URL.

circle-info

Limitations

  • LongString fields longer than 255 characters are truncated in results.

Inputs

  • entity string (required) The ID of the entity to search.

  • filters mapping (optional) Specifies the filter criteria that matching records must satisfy. Keys are field IDs obtained from the schema, values are the filter conditions. Can be provided in the following ways:

    • A string to match a field value exactly, for example, {"/sample/tissue_type": "blood"}.

    • An OR condition requiring the field to match any of the provided values, for example, {"/sample/status": {"$or": ["active", "processed"]}}.

    • A partial match condition for String, LongString, and ID data types, for example, {"/participant/name": {"$partialMatch": "john"}}.

    • An OR partial match condition requiring partial match against any provided value, for example, {"/analysis/tool": {"$orPartialMatch": ["bwa", "bowtie"]}}.

    • An AND partial match condition requiring partial match against all provided values, for example, {"/sample/description": {"$andPartialMatch": ["tumor", "primary"]}}.

    • Range conditions for Integer, Decimal, Date, and DateTime data types, for example:

      • Inclusive range with both bounds: {"/participant/age": {"$from": "18", "$to": "65"}}.

      • Exclusive range with both bounds: {"/data/file_size": {"$fromExclusive": "1000", "$toExclusive": "10000"}}.

      • Range with only lower bound: {"/participant/age": {"$from": "18"}}.

      • Range with mixed bounds: {"/data/file_size": {"$fromExclusive": "1000", "$to": "10000"}}.

  • limit integer (optional) Maximum number of results to return per page. Defaults to 50. Must be between 1 and 200.

  • starting string (optional) Pagination token to retrieve subsequent results. The value from next in the response of a prior call.

Outputs

  • results array of mappings The matching records, each containing:

    • internal_id string The globally unique internal ID of the record.

    • describe mapping The found record with field IDs as keys and field values as values. The values can be strings, arrays of strings, or null, representing the schema field data types.

  • resultSchema array of mappings Description of fields returned in the describe field, ordered for display:

    • field string The field ID.

    • aggregation string The aggregation type applied to field values. Supports "list" (array of values with consistent sorting across fields).

  • totalResults integer (nullable) Total number of results matching the input parameters (may exceed returned results). Returns null when the count cannot be determined.

  • next string (nullable) Pagination token for the next set of results, or null if no more results are available.

  • previous string (nullable) Pagination token for the previous set of results, or null if no prior results exist.

Errors

  • InvalidInput

    • The filters parameter contains invalid field IDs or malformed filter conditions.

    • The limit parameter is not between 1 and 200 (inclusive).

    • The starting parameter is invalid or expired.

  • ResourceNotFound

    • The entity does not exist in the schema.

Additional standard API errors may be returned.

API Method: /dataCatalog-xxxx/findRelatedRecords

Specification

Gets records that are related to the specified record. For each entity, up to 100 records are returned. The chain of related entities is constructed based on the schema relationships. For public entities (where isPublicInDataCatalog is true in the schema), all records are returned regardless of the requesting user's project access.

This API method uses the data catalog URL returned by the /system/findDataCatalogs API method as the base URL.

Inputs

  • entity string (required) The ID of the entity containing the record.

  • internal_id string (required) The internal ID of the record for which you want to return related entities. Use the /dataCatalog-xxxx/findRecords API method to look up the internal ID of a record.

Outputs

  • array of mappings List of entities with related records, each containing:

    • entity string The entity ID.

    • foundRecords string Indicates result completeness.

      • Must be one of "all", "thereAreMore", or "thereMightBeMore":

        • "all" — 100 or fewer records were found. All related records were returned.

        • "thereAreMore" — 101 or more records were found. First 100 were returned.

        • "thereMightBeMore" — Records were searched based on an entity with limited results. There might be additional records not found due to the limitation.

    • results array of mappings The related records, each containing:

      • internal_id string The internal ID of the related record.

      • describe mapping The found record with field IDs as keys and field values as values. Includes the fields dx_project_id, name, and the primary ID.

Errors

  • ResourceNotFound

    • The entity does not exist in the schema.

    • The specified internal_id does not exist in the entity.

Additional standard API errors may be returned.

API Method: /dataCatalog-xxxx/getFilters

Specification

Gets a list of fields that can be used to filter records of a given entity. Useful when searching with the /dataCatalog-xxxx/findRecords or /dataCatalog-xxxx/downloadRecords API methods.

This API method uses the data catalog URL returned by the /system/findDataCatalogs API method as the base URL.

Inputs

  • entity string (required) The ID of the entity to search.

Outputs

  • fields array of mappings List of fields that can be used in the filters parameter when searching the entity.

    • id string The field ID (use as keys in the filters mapping).

Errors

  • ResourceNotFound

    • The entity does not exist in the schema.

Additional standard API errors may be returned.

API Method: /dataCatalog-xxxx/getProjectSyncStatus

Specification

Gets the current synchronization status for a project.

This API method uses the data catalog URL returned by the /system/findDataCatalogs API method as the base URL.

Inputs

  • projectId string (required) The ID of the project to check.

Outputs

  • syncState string The current sync state.

    • Must be one of "SYNC_REQUESTED", "SYNCING", or "IDLE".

  • autoSyncEnabled boolean Whether automatic synchronization is enabled for this project.

  • lastProjectSync string The time the last project sync completed, in RFC 3339arrow-up-right format, for example, "2025-04-03T22:01:15.000Z".

  • lastSyncInvokedAt string The time the last sync was initiated, in RFC 3339arrow-up-right format, for example, "2025-04-03T22:01:00.000Z".

Errors

  • ResourceNotFound

    • The specified projectId does not exist or is not associated with this data catalog.

Additional standard API errors may be returned.

API Method: /dataCatalog-xxxx/getSchema

Specification

Gets a schema defined for a specific data catalog, that is its entities, fields, and relationships.

This API method uses the data catalog URL returned by the /system/findDataCatalogs API method as the base URL.

Inputs

None.

Outputs

  • entities array of mappings List of entities, each with the following fields:

    • id string The unique identifier of the entity. Example: "/sample", "/analysis".

    • displayName string The human-readable name used when referring to a single record of this entity type. Examples: "Sample", "Analysis", "Participant".

    • displayNamePlural string The human-readable name used when referring to multiple records of this entity type. Examples: "Samples", "Analyses", "Participants".

    • description string A human-readable description of the entity.

    • isPublicInDataCatalog boolean When true, all records of this entity are visible to all users with access to the data catalog, regardless of project permissions. Record IDs for public entities must be unique across the entire data catalog. When not specified or false, records follow standard project-based access controls.

    • primaryIdField string The ID of the field that contains the primary identifier for records of this entity. Example: "/sample/sample_id".

    • nameField string The ID of the field that contains the human-readable display name, used together with primaryIdField to describe each record. Example: "/sample/sample_name".

    • fields array of mappings The fields belonging to this entity:

      • id string The unique identifier of the field across the entire schema. Example: "/analysis/sequencing_method".

      • dataType string The data type of the field value. See Schema Field Data Types.

      • displayName string The human-readable name of the field. Example: "Sequencing Method".

      • description string A human-readable description of the field.

      • allowedValues array of strings The permitted values for this field. When the field is not required during ingestion (that is, isRequiredInIngestion is not defined or false), null value is also allowed but not included in this array.

      • suggestions array of strings Suggested values for use as search filters.

      • isHiddenFromResultsTable boolean Whether the field is hidden from result tables in the UI.

      • isDataObjectId boolean Whether the field contains a DNAnexus data object ID.

      • isDataObjectProjectId boolean Whether the field contains a DNAnexus data object project ID. Applicable only when the entity is a data object and the field is project ID.

      • isExecutionId boolean Whether the field contains a DNAnexus execution ID (job or analysis ID).

      • isRequiredInIngestion boolean Whether the field value must be provided when ingesting records.

      • isSystemManaged boolean Whether the field value is managed by the system and cannot be modified by users.

      • linkedField mapping Defines a link to another entity:

        • entity string The ID of the referenced entity, such as "/sample".

        • field string The ID of the referenced field, such as "/sample/sample_id".

Errors

This method may return standard API errors.

API Method: /dataCatalog-xxxx/removeRecords

Specification

Removes specified records from the data catalog. If a record with the specified primary ID does not exist, it is skipped, not counted, and does not generate an error.

You can remove multiple records from multiple entities in one request. If some record removals fail due to an error, the other records are still removed. The operation is not atomic.

The requesting user must have at least CONTRIBUTE permission for the project the metadata is associated with. For removing records from protected projects, the requesting user must have ADMINISTER access. For details see, Access Control and Permissions.

This API method uses the data catalog URL returned by the /system/findDataCatalogs API method as the base URL.

circle-info

Limitations

Maximum 5,000 records or 1MB body size per request.

Inputs

  • projectId string (required) The ID of the project the records are associated with.

  • dryRun boolean (optional) Whether to only validate the inputs without making changes.

  • entities mapping (required) The records to be removed, grouped by entity:

    • key — the entity ID.

    • value mapping — the records to remove for that entity:

      • ids array of strings (required) The IDs of records to be removed.

Outputs

Returns a mapping with entity IDs as keys and removal results as values:

  • key — the entity ID provided in the input.

  • value mapping — results for each entity:

    • removed integer The number of records that were successfully removed (does not include non-existent records).

    • errors array of mappings Information about records that failed to be removed:

      • index integer The index of the failed record in the input array.

      • message string Description of the error.

Errors

  • InvalidInput

    • The projectId is not a valid project ID.

    • The entities parameter contains invalid entity IDs or malformed record identifiers.

  • PermissionDenied

    • The requesting user does not have at least CONTRIBUTE permission for the project.

    • For protected projects, the requesting user does not have ADMINISTER access.

Additional standard API errors may be returned.

API Method: /dataCatalog-xxxx/syncProject

Specification

Triggers a sync process for the specified project. After successful sync, all data objects present in the project are reflected as records in the data object entity in the data catalog with updated system-generated metadata.

This API method uses the data catalog URL returned by the /system/findDataCatalogs API method as the base URL.

Inputs

  • projectId string (required) The ID of the project to synchronize.

Outputs

  • accepted string Always contains "ok" when the sync request is successfully accepted (HTTP status code 202).

Errors

  • InvalidInput

    • The projectId is not a valid project ID.

  • ResourceNotFound

    • The specified projectId does not exist or is not associated with this data catalog.

  • InvalidState

    • A sync is already in progress for this project.

  • PermissionDenied

    • The requesting user does not have permission to sync the project.

Additional standard API errors may be returned.

API Method: /dataCatalog-xxxx/updateProjectSync

Specification

Updates the automatic synchronization setting for a project. This controls whether the project automatically syncs metadata changes with the Omics Data Catalog. When enabled, automatic synchronization runs every 6 hours.

This API method uses the data catalog URL returned by the /system/findDataCatalogs API method as the base URL.

Inputs

  • projectId string (required) The ID of the project to update.

  • autoSyncEnabled boolean (required) Whether automatic synchronization should be enabled for this project.

Outputs

  • projectId string The ID of the project that was updated.

  • autoSyncEnabled boolean The current status of the automatic synchronization setting.

Errors

  • InvalidInput

    • The projectId is not a valid project ID.

    • The autoSyncEnabled parameter is not a boolean value.

  • ResourceNotFound

    • The specified projectId does not exist or is not associated with this data catalog.

  • PermissionDenied

    • The requesting user does not have permission to modify the project sync settings.

Additional standard API errors may be returned.

API Method: /dataCatalog-xxxx/upsertRecords

Specification

Inserts or updates records in the data catalog. If a record with the same primary ID and project ID already exists, the values are updated. If an existing record is not found, a new record is inserted using the provided ID.

For public entities (where isPublicInDataCatalog is true in the schema), record IDs must be unique across the entire data catalog, not just within a project. Attempting to insert a record with an ID that already exists in a different project results in an error for that record.

You can upsert multiple records into multiple entities in one request. If some record updates fail due to an error, the other records are still inserted or modified. The operation is not atomic.

The requesting user must have at least UPLOAD permission for the project the metadata is associated with. For updating instances with null values, the requesting user must have CONTRIBUTE access in normal projects and ADMINISTER access in protected projects. For details see, Access Control and Permissions.

This API method uses the data catalog URL returned by the /system/findDataCatalogs API method as the base URL.

circle-info

Limitations

Maximum 5,000 records or 1MB body size per request.

Inputs

  • projectId string (required) The ID of the project the records are associated with.

  • dryRun boolean (optional) Whether to only validate the inputs without making changes.

  • entities mapping (required) Specifies the records to be upserted, with entity IDs as keys and arrays of records as values:

    • key — the entity ID.

    • value array of mappings — records to be inserted or updated for the entity. The primary field ID (as defined in primaryIdField for the entity) must always be provided to identify the record.

      • <field id> Field values as strings that can be parsed as the field's data type, or null. If a field is not provided, the field value is not modified.

Outputs

Returns a mapping with entity IDs as keys and processing results as values:

  • key — the entity ID provided in the input.

  • value mapping — results for each entity:

    • ok integer The number of records processed successfully.

    • errors array of mappings Information about records that failed to process:

      • index integer The index of the failed record in the input array.

        • Only present when the error applies to a specific record. If absent, the error applies to the entire entity.

      • message string Description of the error.

Errors

  • InvalidInput

    • The projectId is not a valid project ID.

    • The entities parameter contains invalid entity IDs.

    • A field value cannot be parsed as the field's data type.

    • A required field is missing from a record.

    • For public entities, a record ID already exists in a different project (IDs must be unique across the catalog).

    • The request exceeds the maximum of 5,000 records or 1MB body size.

  • PermissionDenied

    • The requesting user does not have at least UPLOAD permission for the project.

    • For updating instances with null values, the requesting user does not have CONTRIBUTE access in normal projects or ADMINISTER access in protected projects.

Additional standard API errors may be returned.

Last updated

Was this helpful?