Dataset Management

An Apollo license is required to use Apollo Datasets on the DNAnexus Platform. Org approval may also be required. Contact DNAnexus Sales for more information.

An Apollo Dataset is a record which contains all of the metadata, relationship mapping, and storage locations associated with a set of data. A Dataset is constructed from instances of data, each having specific data characteristics which conform to a user-defined or predefined molecular data model of known properties and relationships.

The user defined data is stored in the Dataset’s Entities and is most commonly created through the Data Model Loader app. Entities consist of a flexible structure where all relationships have only a single path to the main Entity (no circular references) and where each Entity contains a set of Fields mapping to raw ingested data. This structure is flexible enough to contain a wide array of data (e.g. clinical data, electronic health records, molecular phenotypes, biomarkers, general practitioner notes) and supports a bring-your-own-schema and bring-your-own-ontology model. In addition to Entities, a Dataset may have Assays. Assays are rigid, pre-defined structures built to support common molecular data formats easily with scalable and pre-determined metadata. Apps such as Molecular Expression Assay Loader are used to scalably load bulk RNAseq data and create a Dataset with the appropriate Assay.

Through the use of the Clinical Dataset Merger and the Assay Dataset Merger a data administrator can link multiple Datasets together to create just the right Dataset to share based on the data relationships, user's needs, and access requirements. Multiple Datasets can be built on top of the same ingested data such that different views may be shared with different people, without the need to duplicate underlying data.

In certain cases where new phenotypic data is incrementally added at high frequency (e.g. daily, weekly, monthly), a user can end up with several datasets, each with incrementally more information. In these cases where a Dataset grows through new Fields, Entities, and even new Assays, the Rebase Cohorts And Dashboards app may be used to quickly migrate existing dashboard views and cohorts from an older, smaller Dataset to a newer Dataset.

Last updated