Omics Data Catalog

circle-info

A license is required to use the Omics Data Catalog on the DNAnexus Platform. Contact DNAnexus Salesenvelope for more information.

Omics Data Catalog is a metadata management system that addresses a core challenge in scientific research: data discoverability. The catalog stores structured metadata about your research data stored on the Platform, making it more findable, accessible, interoperable, and reusable (FAIRarrow-up-right). Unlike standard platform metadata (key-value pairs), the catalog creates relationships between customizable entities, such as studies, samples, assays, and data files, enabling search across projects. Data catalogs can be shared across multiple organizations, enabling cross-organizational collaboration while maintaining project-based access controls.

How does the Omics Data Catalog work?

The catalog stores structured information about your data but does not manage the actual files. It relies on a schema that is customized to your research needs. For example, the schema might define connections between different entities, such as participants, samples, assays, and files, as well as specific fields of those entities.

Metadata must be explicitly added to the data catalog through the Data Catalog Loader, API calls to /dataCatalog-xxxx/upsertRecords, or project synchronization for data objects.

By default, metadata does not appear automatically. You can enable project synchronization on curated, catalog-ready projects to avoid adding unwanted data objects. For sync behavior and cleanup options, see Controlling Project Synchronization. Users cannot modify added metadata through the catalog interface, unlike standard platform metadata (tags and properties) which authorized users can edit. This ensures data integrity and consistency while maintaining organizational standards, as all metadata must conform to the predefined schema structure.

You cannot share metadata with users who don't have access to the project associated with that metadata. To view and search records, users must belong to an organization with access to the data catalog (either the billTo organization or an invited organization) and have appropriate project permissions.

Organizations typically choose a data administrator who controls the metadata ingestion process and any subsequent changes.

Access Control Model

Metadata is stored centrally within a data catalog, but access follows your existing project permissions. You can only see and search metadata from projects where you have at least VIEW access. This means the same search may show different results to different users based on their project permissions.

For cross-project linking and discovery, you can mark certain entities as public within the data catalog schema. Public entities make metadata visible to all data catalog users, regardless of their project permissions. This allows for broader discoverability of reference entities, such as analyses or protocols, while still enforcing project-level access controls. However, users still need appropriate project permissions to access the underlying data objects referenced by public entity records.

Data catalogs can be shared across multiple organizations. Users from invited organizations can view, search, update, and delete metadata according to their project permissions. Modifications to the data catalog require the appropriate project permissions, regardless of whether the user is in the billTo or invited organization.

For detailed information about permissions, public entities, and collaboration, see Concepts and Architecture.

Omics Data Catalog Metadata vs. Standard Platform Metadata

The DNAnexus Platform offers two distinct metadata systems, each designed for different use cases:

Feature
Platform Metadata
Omics Data Catalog

Data Structure

Key-value pairs and tags

Structured entities with defined relationships

Search Scope

Cross-project search with permission filtering via API only

Cross-project search with permission filtering via API and UI

Data Relationships

No connections between tags or properties

Structured relationships (for example, participant → sample → assay → file)

Schema Enforcement

None

Enforced schema standards with type constraints

Use Case

Basic file metadata (size, type, name) plus unstructured tags and properties

Complex omics metadata (assay date, lab info, sample details)

Data Discovery

Manual browsing within projects (UI) and cross-project search via API

Faceted filtering across multiple projects

Metadata Modification

Users can modify tags and properties

Permission-based modification (read-only in the UI)

Using the Omics Data Catalog

Prerequisites

  • Your organization has access to an Omics Data Catalog (either as the billTo organization or an invited organization).

  • You have VIEW access to at least one project associated with cataloged metadata.

Finding Your Data

Complex research questions often require filtering across multiple fields from different entities in your schema. The catalog supports filtering that combines cross-entity criteria to help you identify specific datasets.

  1. In the DNAnexus Platform, click Data Catalog.

  2. On the Search tab, use the left panel to filter by specific fields.

  3. Repeat step 2 as needed to filter by multiple fields, which can be from multiple entities.

The filters for different fields are grouped using AND logic, meaning all conditions must be true for a record to be included in the results. After finding the data you need, you can copy data objects to your project for analysis.

For example, you can hover over an entity field, such as the Therapeutic Area (categorical string field) of a Study (entity), and click Add as Filter.

Filtering by specific field values in Omics Data Catalog
circle-info

What if search results don't match your expectations?

When searches don't return expected results, verify that filter combinations aren't overly restrictive by removing criteria systematically. Remember that your project permissions automatically filter all results, so contact project administrators if you expect to see additional data. Also, understanding your organization's schema helps identify why certain relationships or combinations might not exist as expected.

The catalog's power comes from modeling research data as interconnected entities rather than isolated files. By navigating these relationships, you can explore data across the complete research workflow, from study participants through sample collection to final analysis outputs.

This relationship-aware approach eliminates the work typically required with file-based data organization systems. Instead of manually correlating metadata across spreadsheets, you can navigate directly through the data model to build complete pictures of your research assets.

To find metadata linked to a specific record:

  1. Click the record to open its details.

  2. In the record's details on the right, filter by the linked entities displayed at the top of the list.

For example, within the details of a specific sample, you can click Filter Linked Study to view related data objects. Be aware that this replaces your existing filtering criteria.

Selecting to filter by a specific linked Study record

Whenever records link to data objects, such as files, you can open the files directly from the catalog by clicking the links in the record details. Or, when viewing Data Object entities, you can click their IDs directly in the search results table.

circle-check

Working with Search Results

After finding relevant data through search and navigation, you can copy, share, or export your results.

Copy Data Objects to Your Project

To work with data objects from your search results:

  1. Switch to the Data Objects entities view in the search results.

  2. Select the data objects you want to copy.

  3. Click Copy to Project.

Selecting data objects to copy

This copies the actual files into your project, where you can analyze them or use them as inputs for workflows.

Share and Export Results

You can share your search results with colleagues in two ways:

  • Export Results: Select specific records and export them as CSV files using the Export button. The exported file includes all entity fields, even those not visible in the current table view.

  • Share URLs: Click Copy URL to share your search state. Recipients see results filtered by their own project permissions, ensuring secure collaboration without exposing data they shouldn't access.

circle-info

The export is limited to 250 records. For searches returning more than 250 records, add filters to narrow your results, or use the /datacatalog-xxxx/findRecords API method with pagination to retrieve larger datasets programmatically.

Exploring Catalog Schema

Understanding the data catalog schema helps you use the catalog more effectively.

The Schema tab provides an interactive map of how different types of metadata connect to each other in your organization's catalog, visualizing the following:

  • Entities - The different categories of research data your organization tracks, such as Studies, Participants, Samples, Assays, or Data Objects.

  • Fields - The specific information stored for each entity type, such as collection date for samples, or assay type for experiments.

  • Relationships - How different entity types connect to each other, such as how participants relate to samples, or how samples connect to analysis files.

Interactive schema visualization with insights into upstream and downstream related entities

On the Schema tab, you can download the whole schema as a CSV file, or CSV templates for individual entity types for easier ingestion using the Data Catalog Loader app.

Next Steps

Last updated

Was this helpful?