Dataset Versions

Learn about the differences between V1.1 and V3.0 Apollo Datasets.

An Apollo license is required to use Apollo Datasets on the DNAnexus Platform. Org approval may also be required. Contact DNAnexus Sales for more information.

Because the dataset is a series of JSON objects, fundamental changes to the structure are versioned to help simplify functionality support communication. The intent is that the current version (v3.0) will continue to support the evolution of our user needs and any updates will be incremental vs major.

V3.0 Dataset

The v3.0 dataset framework was built to provide a flexible infrastructure to support data in from the following 4 groups:

Phenotypical/Clinical Data

With v3.0, wider support was build for phenotypical / clinical data to be split across many entities with one-to-many relationships and that contain wider longitudinal data attributes. This data is most commonly ingested via Data Model Loader or other standardized pipelines (e.g. uk biobank loading pipeline). Additionally, the v3.0 data model supports linking data across multiple databases.

Assay Data

The v3.0 data model expands the relationship between the phenotypic data and omic data to allow for multiple assay types to be linked to a singular core phenotypic dataset. The linkages can be directly to the core main entity (e.g. Patient) or each assay can link to different entities (e.g. an RNAseq assay linked to a specific Encounter). This data is most commonly ingested via a specialized loader (e.g. VCF ETL Orchestrator) and then merged with a core phenotypic dataset to generate the novel pheno-geno dataset.

Default Dashboards

With the v3.0 framework dashboards are now directly explorable so defaults are no longer needed and the dashboard can be shared with most users directly. If a default dashboard is desired, it is set on the properties of the dataset object as follows:

For every dashboard that's desired to appear in the dropdown add a property entry as follows: dashboard-<display name> : <dashboard record-id> where the display name can be any alphanumeric name desired including spaces. Up to 10 entries can be added. To choose which record should be the default one loaded, add one additional property of defaultDashboard : <display name> where the display name matches one of the entries. The user must have viewer access to all records.

A sample configuration with three dashboards:

dashboard-group 1 dash: record-G39384knk39ksdnf32
dashboard-group 2 dash: record-G393d9e949mgyJel42
dashboard-group 3 dash: record-G84nheiIKjenmlk84j
defaultDashboard: group 1 dash

V1.1 Dataset

Most datasets ingested before Q4 2020 were in V1.1 format. The Legacy Cohort Browser supports visualizing v1.1. datasets.

Default Dashboards

The default dashboards in v1.1 are embedded into the dataset. Because of this, to set or update the defaults, the dataset must be recreated using Data Model Loader with the skip ingestion setting.

Last updated