Learn about Apollo Datasets, how they're constructed, and how to use them.
An Apollo Dataset, or Dataset, is a DNAnexus record of type dataset that encapsulates both data and metadata, mapping between the logical data structure (phenotypes, genotypes, etc.) and the physical layout of the underlying database(s) and metadata lookups. It enables the user to combine different data modalities (e.g. phenotypic, clinical information, notes, genomic, transcriptomic) across multiple databases into a single linked, documented object. Because all of this information is stored in one record (which contains the provenance of how it was created), Datasets simplify and enable the following:
- 1.Simple reuse in the same experiment or across multiple experiments.
- 2.Easy modeling and sharing of multi-omic datasets.
- 3.Scalability to project scales such as TCGA or UKB where sets of raw data can reach petabyte scale.
- 4.Accelerate reproducibility through the maintenance of data linkages and annotations.
- 5.Reduces the cost to build tools with the creation of a predictable, well-structured, and documented framework to then build from.
You can conveniently view all of your Datasets in the user interface by selecting Dataset from the Project menu.
Example uses of a Dataset include: