Dataset Management

An Apollo license is required to use Apollo Datasets on the DNAnexus Platform. Org approval may also be required. Contact DNAnexus Sales for more information.

An Apollo Dataset is a record which contains the metadata, relationship mapping, and storage locations associated with a set of data. A Dataset is constructed from instances of data, each having specific data characteristics which conform to a user-defined or predefined molecular data model of known properties and relationships.

The user defined data is stored in the Dataset Entities and is most commonly created through the Data Model Loader app. Entities consist of a flexible structure where all relationships have only a single path to the main Entity (no circular references) and where each Entity contains a set of Fields mapping to raw ingested data. This structure is flexible enough to contain a wide array of data types such as clinical data, electronic health records, molecular phenotypes, biomarkers, and general practitioner notes. It supports a bring-your-own-schema and bring-your-own-ontology model.

Datasets can also contain Assays alongside Entities. Assays are rigid, pre-defined structures built to support common molecular data formats easily with scalable and pre-determined metadata. Apps such as Molecular Expression Assay Loader are used to load bulk RNAseq data at scale and create a Dataset with the appropriate Assay.

Using the Clinical Dataset Merger and the Assay Dataset Merger a data administrator can link multiple Datasets together to create just the right Dataset to share based on the data relationships, user's needs, and access requirements. Multiple Datasets can be built on top of the same ingested data such that different views may be shared with different people, without the need to duplicate underlying data.

In certain cases where new phenotypic data is incrementally added at high frequency, a user can end up with multiple datasets, each with incrementally more information. This can happen with daily, weekly, or monthly data updates. In these cases where a Dataset grows through new Fields, Entities, and even new Assays, the Rebase Cohorts And Dashboards app may be used to quickly migrate existing dashboard views and cohorts from an older, smaller Dataset to a newer Dataset.

Last updated 1 month ago

Was this helpful?