Omics Data Catalog
A license is required to use the Omics Data Catalog on the DNAnexus Platform. Contact DNAnexus Sales for more information.
Omics Data Catalog is a metadata management system that addresses a core challenge in scientific research: data discoverability. The catalog stores structured metadata about your research data stored on the Platform, making it more findable, accessible, interoperable, and reusable (FAIR). Unlike standard platform metadata (key-value pairs), the catalog creates relationships between customizable entities, such as studies, samples, assays, and data files, enabling search across projects. Data catalogs can be shared across multiple organizations, enabling cross-organizational collaboration while maintaining project-based access controls.
How does the Omics Data Catalog work?
The catalog stores structured information about your data but does not manage the actual files. It relies on a schema that is customized to your research needs. For example, the schema might define connections between different entities, such as participants, samples, assays, and files, as well as specific fields of those entities.
Metadata must be explicitly added to the data catalog through the Data Catalog Loader, API calls to /dataCatalog-xxxx/upsertRecords, or project synchronization for data objects.
By default, metadata does not appear automatically. You can enable project synchronization on curated, catalog-ready projects to avoid adding unwanted data objects. For sync behavior and cleanup options, see Controlling Project Synchronization. Users cannot modify added metadata through the catalog interface, unlike standard platform metadata (tags and properties) which authorized users can edit. This ensures data integrity and consistency while maintaining organizational standards, as all metadata must conform to the predefined schema structure.
You cannot share metadata with users who don't have access to the project associated with that metadata. To view and search records, users must belong to an organization with access to the data catalog (either the billTo organization or an invited organization) and have appropriate project permissions.
Organizations typically choose a data administrator who controls the metadata ingestion process and any subsequent changes.
Access Control Model
Metadata is stored centrally within a data catalog, but access follows your existing project permissions. You can only see and search metadata from projects where you have at least VIEW access. This means the same search may show different results to different users based on their project permissions.
For cross-project linking and discovery, you can mark certain entities as public within the data catalog schema. Public entities make metadata visible to all data catalog users, regardless of their project permissions. This allows for broader discoverability of reference entities, such as analyses or protocols, while still enforcing project-level access controls. However, users still need appropriate project permissions to access the underlying data objects referenced by public entity records.
Data catalogs can be shared across multiple organizations. Users from invited organizations can view, search, update, and delete metadata according to their project permissions. Modifications to the data catalog require the appropriate project permissions, regardless of whether the user is in the billTo or invited organization.
For detailed information about permissions, public entities, and collaboration, see Concepts and Architecture.
Omics Data Catalog Metadata vs. Standard Platform Metadata
The DNAnexus Platform offers two distinct metadata systems, each designed for different use cases:
Data Structure
Key-value pairs and tags
Structured entities with defined relationships
Search Scope
Cross-project search with permission filtering via API only
Cross-project search with permission filtering via API and UI
Data Relationships
No connections between tags or properties
Structured relationships (for example, participant → sample → assay → file)
Schema Enforcement
None
Enforced schema standards with type constraints
Use Case
Basic file metadata (size, type, name) plus unstructured tags and properties
Complex omics metadata (assay date, lab info, sample details)
Data Discovery
Manual browsing within projects (UI) and cross-project search via API
Faceted filtering across multiple projects
Metadata Modification
Users can modify tags and properties
Permission-based modification (read-only in the UI)
Using the Omics Data Catalog
Prerequisites
Your organization has access to an Omics Data Catalog (either as the
billToorganization or an invited organization).You have VIEW access to at least one project associated with cataloged metadata.
Finding Your Data
Complex research questions often require filtering across multiple fields from different entities in your schema. The catalog supports filtering that combines cross-entity criteria to help you identify specific datasets.
In the DNAnexus Platform, click Data Catalog.
On the Search tab, use the left panel to filter by specific fields.
Repeat step 2 as needed to filter by multiple fields, which can be from multiple entities.
The filters for different fields are grouped using AND logic, meaning all conditions must be true for a record to be included in the results. After finding the data you need, you can copy data objects to your project for analysis.
For example, you can hover over an entity field, such as the Therapeutic Area (categorical string field) of a Study (entity), and click Add as Filter.

What if search results don't match your expectations?
When searches don't return expected results, verify that filter combinations aren't overly restrictive by removing criteria systematically. Remember that your project permissions automatically filter all results, so contact project administrators if you expect to see additional data. Also, understanding your organization's schema helps identify why certain relationships or combinations might not exist as expected.
Navigating Linked Metadata
The catalog's power comes from modeling research data as interconnected entities rather than isolated files. By navigating these relationships, you can explore data across the complete research workflow, from study participants through sample collection to final analysis outputs.
This relationship-aware approach eliminates the work typically required with file-based data organization systems. Instead of manually correlating metadata across spreadsheets, you can navigate directly through the data model to build complete pictures of your research assets.
To find metadata linked to a specific record:
Click the record to open its details.
In the record's details on the right, filter by the linked entities displayed at the top of the list.
For example, within the details of a specific sample, you can click Filter Linked Study to view related data objects. Be aware that this replaces your existing filtering criteria.

Whenever records link to data objects, such as files, you can open the files directly from the catalog by clicking the links in the record details. Or, when viewing Data Object entities, you can click their IDs directly in the search results table.
You can navigate between past filtering criteria using the Undo filters and Redo filters buttons in the top right to step backward and forward through your filter history.

Working with Search Results
After finding relevant data through search and navigation, you can copy, share, or export your results.
Copy Data Objects to Your Project
To work with data objects from your search results:
Switch to the Data Objects entities view in the search results.
Select the data objects you want to copy.
Click Copy to Project.

This copies the actual files into your project, where you can analyze them or use them as inputs for workflows.
Share and Export Results
You can share your search results with colleagues in two ways:
Export Results: Select specific records and export them as CSV files using the Export button. The exported file includes all entity fields, even those not visible in the current table view.
Share URLs: Click Copy URL to share your search state. Recipients see results filtered by their own project permissions, ensuring secure collaboration without exposing data they shouldn't access.
The export is limited to 250 records. For searches returning more than 250 records, add filters to narrow your results, or use the /datacatalog-xxxx/findRecords API method with pagination to retrieve larger datasets programmatically.
Exploring Catalog Schema
Understanding the data catalog schema helps you use the catalog more effectively.
The Schema tab provides an interactive map of how different types of metadata connect to each other in your organization's catalog, visualizing the following:
Entities - The different categories of research data your organization tracks, such as Studies, Participants, Samples, Assays, or Data Objects.
Fields - The specific information stored for each entity type, such as collection date for samples, or assay type for experiments.
Relationships - How different entity types connect to each other, such as how participants relate to samples, or how samples connect to analysis files.

On the Schema tab, you can download the whole schema as a CSV file, or CSV templates for individual entity types for easier ingestion using the Data Catalog Loader app.
Next Steps
Understand the concepts and architecture of Omics Data Catalog.
Ingest metadata using the Data Catalog Loader app.
Last updated
Was this helpful?