# Dataset Management

{% hint style="info" %}
An Apollo license is required to use Apollo Datasets on the DNAnexus Platform. Org approval may also be required. [Contact DNAnexus Sales](mailto:sales@dnanexus.com) for more information.
{% endhint %}

An [Apollo Dataset](/developer/datasets.md) is a record which contains the metadata, relationship mapping, and storage locations associated with a set of data. A Dataset is constructed from instances of data, each having specific data characteristics which conform to a user-defined or predefined molecular data model of known properties and relationships.

The user defined data is stored in the Dataset Entities and is most commonly created through the [Data Model Loader](https://platform.dnanexus.com/app/data_model_loader_v2) app. Entities consist of a flexible structure where all relationships have only a single path to the main Entity (no circular references) and where each Entity contains a set of Fields mapping to raw ingested data. This structure is flexible enough to contain a wide array of data types such as clinical data, electronic health records, molecular phenotypes, biomarkers, and general practitioner notes. It supports a bring-your-own-schema and bring-your-own-ontology model.

Datasets can also contain Assays alongside Entities. Assays are rigid, pre-defined structures built to support common molecular data formats with scalable and pre-determined metadata. Apps such as [Molecular Expression Assay Loader](https://platform.dnanexus.com/app/molecular-expression-assay-loader) are used to load bulk RNAseq data at scale and create a Dataset with the appropriate Assay.

Using the [Clinical Dataset Merger](https://platform.dnanexus.com/app/clinical_dataset_merger) and the [Assay Dataset Merger](https://platform.dnanexus.com/app/assay_dataset_merger) a data administrator can link multiple Datasets together to create the right Dataset to share based on the data relationships, user's needs, and access requirements. Multiple Datasets can be built on top of the same ingested data such that different views may be shared with different people, without the need to duplicate underlying data.

In certain cases where new phenotypic data is incrementally added at high frequency, a user can end up with multiple datasets, each with incrementally more information. This can happen with daily, weekly, or monthly data updates. In these cases where a Dataset grows through new Fields, Entities, and even new Assays, the [Rebase Cohorts And Dashboards](https://platform.dnanexus.com/app/rebase_cohorts_and_dashboards) app may be used to migrate existing dashboard views and [cohorts](/developer/datasets/cohorts.md) from an older, smaller Dataset to a newer Dataset.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://documentation.dnanexus.com/developer/dataset-management.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
