Cohort Browser

NOTE: Not all features are included in all packages. Please contact sales@dnanexus.com for more information.

Overview

DNAnexus Apollo builds on the technological foundation of the core DNAnexus platform to offer scientists and bioinformaticians an environment to store and query large sets of genomic, phenotypic, multi-omic, and other structured data. Researchers can bring their data to the platform and leverage DNAnexus apps to ingest the data into queryable databases.

These databases can then be explored using the Cohort Browser. Scientists can filter the set of samples by any field within the dataset and save these filtered samples as cohort objects. These cohort objects can be shared with other scientists and also can be used as inputs to analysis apps to perform such tasks as calculating allele frequencies or performing a GWAS analysis.

Bioinformaticians who wish to perform ad hoc statistical analysis are able to spin up JupyterLab environments backed by Spark clusters to directly query their data and create dataframes within a Python or R environment for further analysis.

Launching the Cohort Browser

  1. First find a dataset (record object type) of interest.

  2. Select the dataset and then select Explore Data (right-click or use button or menu).

Exploring Datasets and Creating Cohorts

Your dataset will start with a default view that may or may not contain some database fields presented as charts.

Finding and adding fields of interest

  1. To add a field or chart, click Add Tile. The Add Tile dialog shows a hierarchical view of all the fields present within the dataset.

  2. Browse the list or search on an item to find fields of interest. The search function searches both keywords within a field name as well as within field values.

    Selecting a field will display its metadata and charting options. For example, a field with continuous values such as ‘Age’ may show histogram and boxplot options:

NOTE: Different field types may contain different types of metadata, different field options and different charting options.

  1. Use Add as Tile to add fields of interest to your dashboard. Add as many tiles as you like and then close the window.

Add as Tile

Customizing the Dashboard

The dashboard section displays your tiles. With tiles you can:

  • Rearrange them by dragging tiles to a new position on the screen

  • Resize them by dragging from the lower right corner

  • Remove them by clicking the "x"

  • Display their metadata -- located in the "i" section

  • Review any filters that are currently applied by opening the filter icon

Example dashboard of tiles

Filtering the Data

  1. Click on the charts to create filters. (Note: not all charts allow filtering)

  2. Bar chart: click on a bar to include the represented values in your filter criteria. For multi-select fields, you can toggle the "match any" to "match all" to constrain your filter.

  3. Histogram: drag-select to choose a range of values for your filter.

  4. List boxes: click items to include them in your filter. Click a parent folder to include all the children values. Use the search box to find specific items in long lists. Clear the search box to return to the full list view.

Examples of filters on charts

After you create a filter the cohort count will update. To refresh all the data on the dashboard, click Refresh Dashboard in the banner that appears.

Filters can be removed by selecting Reset Filters within the tile filter menu or by selecting the 'x' on the the filter pill that appears that appears at the top of the dashboard.

Filter Pills. Hover to remove the 'x.'

Adding Genomic Filters

  1. Click Edit Genomic Filter at the top of the dashboard to view options for filtering the cohort based on variant status: Filter by gene and variant effect — specify a gene of interest, then select the transcribed variant effects to retain in the filter. For example, this type of filter would be useful to keep or exclude like loss-of-function variants in a particular gene. A maximum of 5 genes can be entered. Filter by variant ID (RSID or allele coordinate) — specify the variants of interest directly to retain individuals with any of these variants in the cohort. For example, this filter can focus on a known target, or a list of the top hits from a previous GWAS. A maximum of 100 variants can be entered.

Genomic Filter Dialog

NOTE: Entering multiple genes or variants will return individuals with variants in any of the genes listed or with any specific variants listed but not individuals who meet both criteria.

Using the Samples Table

The Samples Table sits below the dashboard tiles. It lists all Sample IDs of the participants in your current cohort (i.e. your filtered dataset), displaying a single sample in each row. There is a column for each tile in your dashboard displaying the corresponding values for each Sample. You can remove columns using the column chooser (the small icon in the right-most column)

Samples Table

From the Samples Table, you can:

  • Further refine the view by searching for any value in the columns. Note that this does not change your cohort.

  • Select the Download icon to export selected items to a .csv file.

NOTE: The Samples Table limits the display to a maximum of 30,000 records.

Using the Variant Browser and Table

A dashboard may contain a Variant Browser section that displays the variants in the cohort.

This section includes a lollipop chart displaying allele frequencies for variants in your cohort across a single gene or a user-specified region that is a maximum of 5 million bases in length. An associated table is provided below the chart that lists the variants along with other information:

  • Type: Whether it is a SNP, Del, or Ins.

  • Consequence: The impact of the variant according to SNPEff. Examples include ‘exon loss variant’ and ‘missense variant.'

  • Population AF: The allele frequency across the complete dataset from which the cohort was created. In this case, the 100K of genomic data provided to Takeda by UK Biobank.

  • GnomAD AF: The allele frequency from the public dataset GnomAD.

  • Cohort AF: The allele frequency calculated for the current cohort. This value updates as you update your cohort.

Note: Please note that downloading data via visualization UI is not suitable for large datasets. You can use the SQL Runner app to download data in a more efficient way.

Use the search box in the chart to select and zoom in on a specific range. You can search by inputting genomic coordinates or a gene name. The results will also update the table contents. The table of variants can be filtered and sorted to find variants of interest. To see variants details (transcripts, annotations, etc.) select the link in the Location column.

Downloading or Copying Variants

  1. To copy variants, use the checkboxes to select either individual rows or select the entire table.

  2. Then choose in which format to copy the IDs. You can paste the results into any text or document file or use them to create a cohort with the genomic filter dialog box. Alternatively, select the Download icon to download selected items as a .csv file.

Saving Cohorts

You can save the current cohort as a record object inside of the selected project by clicking the Save icon in the upper right corner of the Cohort Browser. The cohort object saves the precise set of filters that were used to generate the cohort as well as the specific set of phenotypic chart tiles that are currently on your dashboard.

Downloading Data

To download data from tables, select the items you'd like to download (or click the select-all check box to include the entire table) and then select Download.

Dashboard Views

You can create different dashboard views to enable you to see different groupings of phenotypic fields. After you have a set of tiles that you'd like to save, use the Dashboard Options menu to save the dashboard view. The view is saved as a file with type DashboardView. When you want to open a view, find the file in the Dashboard Options dialog.

Dashboard Options dialog