Cohort Browser

NOTE: Not all features are included in all packages. Please contact sales@dnanexus.com for more information.

Overview

DNAnexus Apollo builds on the technological foundation of the core DNAnexus platform to offer scientists and bioinformaticians an environment to store and query large sets of genomic, phenotypic, multi-omic, and other structured data. Researchers can bring their data to the platform and leverage DNAnexus apps to ingest the data into queryable databases.

These databases can then be explored using the Cohort Browser. Scientists can filter the set of samples by any field within the dataset and save these filtered samples as cohort objects. These cohort objects can be shared with other scientists and also can be used as inputs to analysis apps to perform such tasks as calculating allele frequencies or performing a GWAS analysis.

Bioinformaticians who wish to perform ad hoc statistical analysis are able to spin up JupyterLab environments backed by Spark clusters to directly query their data and create dataframes within a Python or R environment for further analysis.

Accessing Datasets

To use the Cohort Browser, you must first find a dataset of interest. The Datasets page displays all datasets in one location and enables you to browse datasets or use filters to find a specific one. The Datasets page includes an optional information panel in which you can view a datasets creator, sponsorship, and other information.

To access a dataset

  1. From the Projects menu, select Datasets. The Datasets page appears.

  2. Search for the dataset or use the columns to sort to locate it.

    NOTE: You can customize the columns you see by clicking the Columns icon and selecting the columns in the list that displays.

  3. Select the dataset and then select Explore Data. The Cohort Browser launches with the selected dataset.

To display the information panel

  1. From the Datasets page, click on a dataset of interest and then select the Information icon.

Datasets window showing the location of the Information icon.

Exploring Datasets and Creating Cohorts

Your dataset will start with a default view that may or may not contain some database fields presented as charts.

Finding and adding fields of interest

  1. To add a field or chart, click Add Tile. The Add Tile dialog shows a hierarchical view of all the fields present within the dataset.

  2. Browse the list or search on an item to find fields of interest. The search function searches both keywords within a field name as well as within field values.

    Selecting a field will display its metadata and charting options. For example, a field with continuous values such as ‘Age’ may show histogram and boxplot options:

NOTE: Different field types may contain different types of metadata, different field options and different charting options. Up to 30 tiles can be added to a dashboard.

  1. Use Add as Tile to add fields of interest to your dashboard. Add as many tiles as you like and then close the window.

Browse the list or search on an item to find fields of interest. The search function searches both keywords within a field name as well as within field values.

There are several types of fields:

  • CAT (abc) includes categorical integer, categorical string, These can be fields that are multi- or single- select.

  • NUM (123) includes continuous Integer, continuous float.

  • DAT includes date, and date-time.

Customizing Charts

Selecting a field will display its metadata and charting options. For example, a field with continuous values such as ‘Age’ may show histogram and boxplot options:

Different field types may contain different types of metadata, different field options and different charting options. Use the Add as Tile button to add fields of interest to your dashboard.

To create a chart that compares two fields, select the first field. Next, search or browse for a second field and click the “+” plus sign to the right of the field name. This will add the second field. Only fields that can be combined will be activated with the “+” sign. Note: Categorical fields with over 20 items are currently not available for combining.

Use the plus sign on the right to combine fields into a single chart
Drag to rearrange the order of the two fields.

To swap axes, drag the field name on the right panel to rearrange the order.

Click Add as Tile to add it to your dashboard.

Customizing the Dashboard

The dashboard section displays your tiles. With tiles you can:

  • Rearrange them by dragging tiles to a new position on the screen

  • Resize them by dragging from the lower right corner

  • Remove them by clicking the "x"

  • Display their metadata -- located in the "i" section

  • Review any filters that are currently applied by opening the filter icon

Example Dashboard of Tiles

Filtering the Data

  1. Click on the charts to create filters. (Note: not all charts allow filtering)

  2. Bar chart: click on a bar to include the represented values in your filter criteria. For multi-select fields, you can toggle the "match any" to "match all" to constrain your filter. For example, a "match any" query will find subjects with value a OR value b. The "match all" option will find subjects only with both value a AND value b.

  3. Histogram: drag-select to choose a range of values for your filter.

  4. List boxes: click items to include them in your filter. Click a parent folder to include all the children values. Use the search box to find specific items in long lists. Clear the search box to return to the full list view.

Example of Filters in Charts

After you create a filter the cohort count will update. To refresh all the data on the dashboard, click Refresh Dashboard in the banner that appears.

Filters can be removed by selecting Reset Filters within the tile filter menu or by selecting the 'x' on the the filter pill that appears that appears at the top of the dashboard.

Filter Pills. Hover to remove the 'x.'

Refining Filters

After you create an initial filter, you can easily refine the filter by clicking directly on the filter pills. Refining filters is what enables you to exclude certain criteria from the filter. For example, you could elect to filter by disease but exclude all subjects who have diabetes.

To refine filters

  1. Click on any of the filter pills from your existing filter. The Edit Filter dialog appears.

The Edit Filter dialog in which you can adjust filters.

2. Adjust the filter and select Save.

To exclude criteria from filters

  1. Click on any of the filter pills from your existing filter. The Edit Filter dialog appears.

The Edit Filter dialog in which you can exclude one or more criteria

2. Begin to type the item to exclude, and then select the item from the resulting list.

Resetting Filters

If you have created a series of filters but don't like your most recent change, you can quickly revert to the filter created before you last saved by using the Reset Filter icon.

To reset filters

  1. On the Edit Filter dialog for the filter you want to revert, click the Reset Filter icon. The filter reverts to its state before you last clicked Save.

The Edit Filter dialog in which the Reset Filter icon is highlighted

Adding Genomic Filters

  1. Click Edit Genomic Filter at the top of the dashboard to view options for filtering the cohort based on variant status: Filter by gene and variant effect — specify a gene of interest, then select the transcribed variant effects to retain in the filter. For example, this type of filter would be useful to keep or exclude like loss-of-function variants in a particular gene. A maximum of 5 genes can be entered. Filter by variant ID (RSID or allele coordinate) — specify the variants of interest directly to retain individuals with any of these variants in the cohort. For example, this filter can focus on a known target, or a list of the top hits from a previous GWAS. A maximum of 100 variants can be entered.

Genomic Filter Dialog

NOTE: Entering multiple genes or variants will return individuals with variants in any of the genes listed or with any specific variants listed but not individuals who meet both criteria.

Using the Samples Table

The Samples Table sits below the dashboard tiles. It lists all Sample IDs of the participants in your current cohort (i.e. your filtered dataset), displaying a single sample in each row. There is a column for each tile in your dashboard displaying the corresponding values for each Sample. You can remove columns using the column chooser (the small icon in the right-most column)

Customize table columns using column chooser

From the Samples Table, you can:

  • Further refine the view by searching for any value in the columns. Note that this does not change your cohort.

  • Select the Download icon to export selected items to a .csv file.

NOTE: The Samples Table limits the display to a maximum of 30,000 records.

Using the Variant Browser and Table

A dashboard may contain a Variant Browser section that displays the variants in the cohort.

This section includes a lollipop chart displaying allele frequencies for variants in your cohort across a single gene or a user-specified region that is a maximum of 5 million bases in length. An associated table is provided below the chart that lists the variants along with other information:

  • Type: Whether it is a SNP, Del, or Ins.

  • Consequence: The impact of the variant according to SNPEff. Examples include ‘exon loss variant’ and ‘missense variant.'

  • Population AF: The allele frequency across the complete dataset from which the cohort was created. In this case, the 100K of genomic data provided to Takeda by UK Biobank.

  • GnomAD AF: The allele frequency from the public dataset GnomAD.

  • Cohort AF: The allele frequency calculated for the current cohort. This value updates as you update your cohort.

Note: Please note that downloading data via visualization UI is not suitable for large datasets. You can use the SQL Runner app to download data in a more efficient way.

Use the search box in the chart to select and zoom in on a specific range. You can search by inputting genomic coordinates or a gene name. The results will also update the table contents. The table of variants can be filtered and sorted to find variants of interest. To see variants details (transcripts, annotations, etc.) select the link in the Location column.

Downloading or Copying Variants

  1. To copy variants, use the checkboxes to select either individual rows or select the entire table.

  2. Then choose in which format to copy the IDs. You can paste the results into any text or document file or use them to create a cohort with the genomic filter dialog box. Alternatively, select the Download icon to download selected items as a .csv file.

Saving Cohorts

You can save the current cohort as a record object inside of the selected project by clicking the Save icon in the upper right corner of the Cohort Browser. The cohort object saves the precise set of filters that were used to generate the cohort as well as the specific set of phenotypic chart tiles that are currently on your dashboard.

Downloading Data

To download data from tables, select the items you'd like to download (or click the select-all check box to include the entire table) and then select Download.

Dashboard Views

You can create different dashboard views to enable you to see different groupings of phenotypic fields. After you have a set of tiles that you'd like to save, use the Dashboard Options menu to save the dashboard view. The view is saved as a file with type DashboardView. When you want to open a view, find the file in the Dashboard Options dialog.

Comparing Cohorts

To view a comparison of two cohorts, open the first cohort and then select “Compare Cohort” in the header bar. Select a second cohort that is in the same project.

This will open a new browser tab window where you will see a phenotypic comparison of the two cohorts.

Note: that not all chart types are supported. You can change the chart type to a supported format where that option exists (for example, bar charts can be changed to lists using the menu in the chart tile).

If a chart type is not supported, change it using the chart menu to a different type.

Each cohort is assigned a different color which can be seen next to the cohort name on the top of the screen, as well as in the legends within each chart or in the column headers.

Chart showing legend

In Compare mode, you can continue to add and remove tiles. You can also load existing dashboards. Certain features, such as saving dashboards, are not currently available.

Considerations

  1. To compare two cohorts, both cohorts need to be pointing to the same dataset and reside within the same project.

  2. Cohort Compare is only available for Datasets / Cohorts created on and after Jan 1, 2020