Dataset Versions

Because the dataset is a series of JSON objects, fundamental changes to the structure are versioned to help simplify functionality support communication. The intent is that the current version (v3.0) will continue to support the evolution of our user needs and any updates will be incremental vs major.

V3.0 Dataset

The v3.0 dataset framework was built to provide a flexible infrastructure to support data in from the following 4 groups:

Phenotypical/Clinical Data

With v3.0, wider support was build for phenotypical / clinical data to be split across many entities with one-to-many relationships and that contain wider longitudinal data attributes. This data is most commonly ingested via Data Model Loader or other standardized pipelines (e.g. uk biobank loading pipeline). Additionally, the v3.0 data model supports linking data across multiple databases.

Assay Data (across multiple omics)

The v3.0 data model expands the relationship between the phenotypic data and omic data to allow for multiple assay types to be linked to a singular core phenotypic dataset. The linkages can be directly to the core main entity (e.g. Patient) or each assay can link to different entities (e.g. an RNAseq assay linked to a specific Encounter). This data is most commonly ingested via a specialized loader (e.g. VCF ETL Orchestrator) and then merged with a core phenotypic dataset to generate the novel pheno-geno dataset.

Analysis Data

Common analysis is done that links to either assays, or phenotypic data (e.g. GWAS, PCA, etc.). The v3.0 data model has the capability to ingest both secondary and tertiary analysis data and link it to the assay data and/or pheno data available in the dataset. Currently, this is done via bespoke ingestion methods or via the data model loader and dataset merger apps.

Annotation / Reference Data

The v3.0 framework expands the support for annotation structure with more flexibility in place to support annotating various Assays. Currently, the annotation data is populated during assay ingestion.

Default Dashboards

With the v3.0 framework dashboards are now directly explorable so defaults are no longer needed and the dashboard can be shared with most users directly. If a default dashboard is desired, it is set on the properties of the dataset object as follows:

For every dashboard that's desired to appear in the dropdown add a property entry as follows: dashboard-<display name> : <dashboard record-id> where the display name can be any alphanumeric name desired including spaces. Up to 10 entries can be added. To choose which record should be the default one loaded, add one additional property of defaultDashboard : <display name> where the display name matches one of the entries. The user must have viewer access to all records.

Example Configuration with 2 dashboards.

dashboard-group 1 dash: record-G39384knk39ksdnf32
dashboard-group 2 dash: record-G393d9e949mgyJel42
dashboard-group 3 dash: record-G84nheiIKjenmlk84j
defaultDashboard: group 1 dash

V1.1 Dataset

Most datasets ingested before Q4 2020 were in V1.1 format. The Legacy Cohort Browser supports visualizing v1.1. datasets. Additionally, the Association Browser and Association Results work exclusively with v1.1 datasets.

Default Dashboards

The default dashboards in v1.1 are embedded into the dataset. Because of this, to set or update the defaults, the dataset must be recreated using Data Model Loader with the skip ingestion setting.