# Omics Data Catalog

{% hint style="info" %}
A license is required to use the Omics Data Catalog on the DNAnexus Platform. [Contact DNAnexus Sales](mailto:sales@dnanexus.com) for more information.
{% endhint %}

Omics Data Catalog is a metadata management system that addresses a core challenge in scientific research: data discoverability. The catalog stores structured metadata about your research data stored on the Platform, making it more findable, accessible, interoperable, and reusable ([FAIR](https://www.go-fair.org/fair-principles/)). Unlike standard platform metadata (key-value pairs), the catalog creates relationships between customizable entities, such as studies, samples, assays, and data files, enabling search across projects. Data catalogs can be shared across multiple organizations, enabling cross-organizational collaboration while maintaining project-based access controls.

## How does the Omics Data Catalog work?

The catalog stores structured information about your data but does not manage the actual files. It relies on a schema that is customized to your research needs. For example, the schema might define connections between different entities, such as participants, samples, assays, and files, as well as specific fields of those entities.

Metadata must be explicitly added to the data catalog through the [Data Catalog Loader](https://documentation.dnanexus.com/developer/ingesting-data/data-catalog-loader), API calls to [`/dataCatalog-xxxx/upsertRecords`](https://documentation.dnanexus.com/developer/api/omics-data-catalog#api-method-datacatalog-xxxx-upsertrecords), or [project synchronization](https://documentation.dnanexus.com/user/concepts-and-architecture#controlling-project-synchronization) for data objects.

By default, metadata does not appear automatically. You can enable project synchronization on curated, catalog-ready projects to avoid adding unwanted data objects. For sync behavior and cleanup options, see [Controlling Project Synchronization](https://documentation.dnanexus.com/user/concepts-and-architecture#controlling-project-synchronization). Users cannot modify added metadata through the catalog interface, unlike standard platform metadata (tags and properties) which authorized users can edit. This ensures data integrity and consistency while maintaining organizational standards, as all metadata must conform to the predefined schema structure.

You cannot share metadata with users who don't have access to the project associated with that metadata. To view and search records, users must belong to an organization with access to the data catalog (either the `billTo` organization or an invited organization) and have appropriate project permissions.

Organizations typically choose a data administrator who controls the metadata ingestion process and any subsequent changes.

### Access Control Model

Metadata is stored centrally within a data catalog, but access follows your existing [project permissions](https://documentation.dnanexus.com/getting-started/key-concepts/projects#project-access-levels). You can only see and search metadata from projects where you have at least VIEW access. This means the same search may show different results to different users based on their project permissions.

For cross-project linking and discovery, you can mark certain entities as public within the data catalog schema. Public entities make metadata visible to all data catalog users, regardless of their project permissions. This allows for broader discoverability of reference entities, such as analyses or protocols, while still enforcing project-level access controls. However, users still need appropriate project permissions to access the underlying data objects referenced by public entity records.

Data catalogs can be shared across multiple organizations. Users from invited organizations can view, search, update, and delete metadata according to their project permissions. Modifications to the data catalog require the appropriate project permissions, regardless of whether the user is in the `billTo` or invited organization.

For detailed information about permissions, public entities, and collaboration, see [Concepts and Architecture](https://documentation.dnanexus.com/user/concepts-and-architecture#access-control-and-permissions).

### Omics Data Catalog Metadata vs. Standard Platform Metadata

The DNAnexus Platform offers two distinct metadata systems, each designed for different use cases:

| Feature                   | Platform Metadata                                                            | Omics Data Catalog                                                          |
| ------------------------- | ---------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| **Data Structure**        | Key-value pairs and tags                                                     | Structured entities with defined relationships                              |
| **Search Scope**          | Cross-project search with permission filtering via API only                  | Cross-project search with permission filtering via API and UI               |
| **Data Relationships**    | No connections between tags or properties                                    | Structured relationships (for example, participant → sample → assay → file) |
| **Schema Enforcement**    | None                                                                         | Enforced schema standards with type constraints                             |
| **Use Case**              | Basic file metadata (size, type, name) plus unstructured tags and properties | Complex omics metadata (assay date, lab info, sample details)               |
| **Data Discovery**        | Manual browsing within projects (UI) and cross-project search via API        | Faceted filtering across multiple projects                                  |
| **Metadata Modification** | Users can modify tags and properties                                         | Permission-based modification (read-only in the UI)                         |

## Using the Omics Data Catalog

### Prerequisites

* Your organization has access to an Omics Data Catalog (either as the `billTo` organization or an invited organization).
* You have VIEW access to at least one project associated with cataloged metadata.

### Finding Your Data

Complex research questions often require filtering across multiple fields from different entities in your schema. The catalog supports filtering that combines cross-entity criteria to help you identify specific datasets.

1. In the DNAnexus Platform, click **Data Catalog**.
2. On the **Search** tab, use the left panel to filter by specific fields.
3. Repeat step 2 as needed to filter by multiple fields, which can be from multiple entities.

The filters for different fields are grouped using AND logic, meaning all conditions must be true for a record to be included in the results. After finding the data you need, you can [copy data objects to your project](#working-with-search-results) for analysis.

For example, you can hover over an entity field, such as the *Therapeutic Area* (categorical string field) of a *Study* (entity), and click **Add as Filter**.

![Filtering by specific field values in Omics Data Catalog](https://1612471957-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-L_EsL_ie8XyZlLe_yf9%2Fuploads%2Fgit-blob-42b50fbf9f2c9712b55ba8352342b2ba57e0f069%2Fomics-data-catalog-filters.png?alt=media)

{% hint style="info" %}
**What if search results don't match your expectations?**

When searches don't return expected results, verify that filter combinations aren't overly restrictive by removing criteria systematically. Remember that your project permissions automatically filter all results, so contact project administrators if you expect to see additional data. Also, understanding your organization's schema helps identify why certain relationships or combinations might not exist as expected.
{% endhint %}

### Navigating Linked Metadata

The catalog's power comes from modeling research data as interconnected entities rather than isolated files. By navigating these relationships, you can explore data across the complete research workflow, from study participants through sample collection to final analysis outputs.

This relationship-aware approach eliminates the work typically required with file-based data organization systems. Instead of manually correlating metadata across spreadsheets, you can navigate directly through the data model to build complete pictures of your research assets.

To find metadata linked to a specific record:

1. Click the record to open its details.
2. In the record's details on the right, filter by the linked entities displayed at the top of the list.

For example, within the details of a specific sample, you can click **Filter Linked Study** to view related data objects. Be aware that this replaces your existing filtering criteria.

![Selecting to filter by a specific linked Study record](https://1612471957-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-L_EsL_ie8XyZlLe_yf9%2Fuploads%2Fgit-blob-de336866a82d04f529943a0c4b390725611c02f7%2Fomics-data-catalog-filter-linked-study.png?alt=media)

Whenever records link to data objects, such as files, you can open the files directly from the catalog by clicking the links in the record details. Or, when viewing Data Object entities, you can click their IDs directly in the search results table.

{% hint style="success" %}
You can navigate between past filtering criteria using the **Undo filters** and **Redo filters** buttons in the top right to step backward and forward through your filter history.

<img src="https://1612471957-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-L_EsL_ie8XyZlLe_yf9%2Fuploads%2Fgit-blob-736fdbd893596434f0a841c4fc0ff4e46591f193%2Fomics-data-catalog-undo-redo.png?alt=media" alt="" data-size="original">
{% endhint %}

### Working with Search Results

After finding relevant data through search and navigation, you can copy, share, or export your results.

#### Copy Data Objects to Your Project

To work with data objects from your search results:

1. Switch to the **Data Objects** entities view in the search results.
2. Select the data objects you want to copy.
3. Click **Copy to Project**.

![Selecting data objects to copy](https://1612471957-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-L_EsL_ie8XyZlLe_yf9%2Fuploads%2Fgit-blob-740ea5318efe106de459a05d3fc40f27a6f4c8a5%2Fomics-data-catalog-copy-to-project.png?alt=media)

This copies the actual files into your project, where you can analyze them or use them as inputs for workflows.

#### Share and Export Results

You can share your search results with colleagues in two ways:

* **Export Results**: Select specific records and export them as CSV files using the **Export** button. The exported file includes all entity fields, even those not visible in the current table view.
* **Share URLs**: Click **Copy URL** to share your search state. Recipients see results filtered by their own project permissions, ensuring secure collaboration without exposing data they shouldn't access.

{% hint style="info" %}
The export is limited to 250 records. For searches returning more than 250 records, add filters to narrow your results, or use the [`/datacatalog-xxxx/findRecords`](https://documentation.dnanexus.com/developer/api/omics-data-catalog#api-method-datacatalog-xxxx-findrecords) API method with pagination to retrieve larger datasets programmatically.
{% endhint %}

## Exploring Catalog Schema

Understanding the data catalog schema helps you use the catalog more effectively.

The **Schema** tab provides an interactive map of how different types of metadata connect to each other in your organization's catalog, visualizing the following:

* **Entities** - The different categories of research data your organization tracks, such as Studies, Participants, Samples, Assays, or Data Objects.
* **Fields** - The specific information stored for each entity type, such as collection date for samples, or assay type for experiments.
* **Relationships** - How different entity types connect to each other, such as how participants relate to samples, or how samples connect to analysis files.

![Interactive schema visualization with insights into upstream and downstream related entities](https://1612471957-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-L_EsL_ie8XyZlLe_yf9%2Fuploads%2Fgit-blob-5126830536ae5a4fc610a507ef4fb2941236307b%2Fomics-data-catalog-schema-tab.png?alt=media)

On the **Schema** tab, you can download the whole schema as a CSV file, or CSV templates for individual entity types for easier ingestion using the Data Catalog Loader app.

## Next Steps

* Understand the [concepts and architecture](https://documentation.dnanexus.com/user/omics-data-catalog/concepts-and-architecture) of Omics Data Catalog.
* Ingest metadata using the [Data Catalog Loader app](https://documentation.dnanexus.com/developer/ingesting-data/data-catalog-loader).
