Defining and Managing Cohorts

Create, filter, and manage patient cohorts using clinical, genomic, and other data fields in the Cohort Browser.

An Apollo license is required to use Cohort Browser on the DNAnexus Platform. Org approval may also be required. Contact DNAnexus Sales for more information.

Create comprehensive patient cohorts by filtering your datasets. You can combine, compare, and export your cohorts for further analysis.

If you'd like to visualize data in your cohorts, see Creating Charts and Dashboards.

Managing Cohorts

When you start exploring a dataset, Cohort Browser automatically creates an empty cohort that includes all patients/samples. You can then define your cohort criteria by adding filters, and repeat multiple times to create additional cohorts.

The Cohorts panel gives you an overview of your active cohorts on the dashboard (up to 2) and the recently used cohorts (up to 8) in your current session. These can be temporary unsaved cohorts as well as saved cohorts.

To change the active cohorts on the dashboard, you need to swap them between the Dashboard and Recent sections:

In Cohorts > Dashboard, click In Dashboard to remove a cohort from the dashboard.
In Cohorts > Recent, click Add to Dashboard next to the cohort you want to add to the dashboard.

This way you can quickly explore, compare, and iterate across multiple cohorts within a single session.

Defining Cohort Criteria

Adding Clinical and Phenotypic Filters

To apply a filter to your cohort:

For the cohort you want to edit, click Add Filter.
In Add Filter to Cohort > Clinical, select a data field to filter by.
Click Add Cohort Filter.
In Edit Filter, select operators and enter the values to filter by.
Click Apply Filter.

After you apply or edit filters, the participant count updates immediately. However, visualization tiles do not automatically refresh. Click Refresh Visualizations at the top of the dashboard to update all tiles. Click Refresh on individual tiles to update specific charts.

Adding Assay Filters

With multi-assay datasets, you can create cohorts by applying filters from multiple assay types and instances.

When adding filters, you can find assay types under the Assays tab. This allows you to create cohorts that combine different types of data. For example, you can filter patients based on both clinical characteristics and germline variants, merge somatic mutation criteria with gene expression levels, or build cohorts that span multiple assays of the same type.

To learn more about filtering by specific assay types, see:

When working with an omics dataset that includes multiple assays, such as a germline dataset with both WES and WGS assays, you can:

Select specific assays to choose which assay to filter on.
Apply different filters per assay.
Create separate cohorts for different assays of the same type and compare results.

Filter Limits by Assay Type

The maximum number of filters allowed varies by assay type and is shared across all instances of that type:

Germline variant assays: 1 filter maximum
Somatic variant assays: Up to 10 filter criteria
Gene expression assays: Up to 10 filter criteria

Creating Filter Groups

If you add multiple filters from the same category, such as Patient or Sample, they automatically form a filter group.

By default, filters within a filter group are joined by the logical operator 'AND', meaning that all filters in the group must be satisfied for a record to be included in the cohort. You can change the logical operator used within the group to 'OR' by clicking on the operator.

Joining Multiple Filters

Join filters allow you to create cohorts by combining criteria across multiple related data entities within your dataset. This is useful when working with complex datasets that contain interconnected information, such as patient records linked to visits, medications, lab tests, or other clinical data.

Understanding Data Entities

An entity is a grouping of data around a unique item, event, or concept.

In the Cohort Browser, an entity can refer either to a data model object, such as patient or visit, or to a specific input parameter in the Table Exporter app.

Common examples of data entities include:

Patient: Demographics, medical history, baseline characteristics
Visit: Hospital admissions, appointments, encounters
Medication: Prescriptions, dosages, administration records
Lab Test: Results, procedures, sample information

Creating Join Filters

To create join filters that span multiple data entities:

Start a new join filter: On the cohort panel, click Add Filter or, on a chart tile, click Cohort Filters > Add Cohort Filter.
Select secondary entity: Choose data fields from a secondary entity (different from your primary entity) to create the join relationship.
Add criteria to existing joins: To expand an existing join filter, click Add additional criteria on the row of the chosen filter.

Working with Logical Operators

Join filters support both AND as well as OR logical operators to control how criteria are combined:

AND logic: All specified criteria must be met
OR logic: Any of the specified criteria can be met

Key rules for logical operators:

Click on the operator buttons to switch between the AND logic and OR logic.
For a specific level of join filtering, joins are either all AND or all OR.
When using OR for join filters, the existence condition applies first: "where exists, join 1 OR join 2".

Building Complex Join Structures

As your filtering needs become more sophisticated, you can create multi-layered join structures:

Add criteria to branches: Further define secondary entities by adding additional criteria to existing join branches
Create nested joins: Add more layers of join filters that derive from the current branch
Automatic field filtering: The field selector automatically hides fields that are ineligible based on the current join structure

Practical Examples

The following examples show how join filters work in practice:

First Example Cohort - Separate Conditions: This cohort identifies all patients with a "high" or "medium" risk level who meet both of these conditions:
- Have a first hospital visit (visit instance = 1)
- Have had a "nasal swab" lab test at any point (not necessarily during the first visit)
Second Example Cohort - Connected Conditions: This cohort includes all patients with a "high" or "medium" risk level who had the "nasal swab" test performed specifically during their first visit, creating a more restrictive temporal relationship between the visit and lab test.

Saving Cohorts

You can save your cohort selection to a project as a cohort record by clicking Save Cohort in the top-right corner of the cohort panel.

Cohorts are saved with their applied filters, as well as the latest visualizations and dashboard layout. Like other dataset objects, you can find your saved cohorts under the Manage tab in your project.

To open a cohort, double-click it or click Explore Data.

Need to use your cohorts with a different dataset? If you want to apply your cohort definitions to a different Apollo Dataset, you can use the Rebase Cohorts And Dashboards app to transfer your saved cohorts to a new target dataset.

Exporting Data from Cohorts

For each cohort, you can export a list of main entity IDs in your current cohort selection as a CSV file by clicking Export sample IDs.

Data Preview

On the Data Preview tab, you can export tabular information as record IDs or a CSV file. Select multiple table rows to see export options in the top-right corner. Exports include only the fields displayed in the Data Preview tab.

The Data Preview supports up to 30 columns per tab. Tables with 30-200 columns show column names only. In such cases, you can save cohorts but data is not queried. Tables with over 200 columns are not supported.

You can view up to 30,000 records in the Data Preview. If your cohort exceeds this size, the table may not display all data. For larger exports, use the Table Exporter app.

If your view contains more than one table, such as a participants table and a hospital records table, exporting to CSV or TSV generates a separate file for each table.

Download Restrictions

The Cohort Browser follows your project's download policy restrictions. Downloads are blocked when:

Database restrictions apply: If the database storing your dataset has restricted download permissions, you cannot download data from any Cohort Browser view of that dataset, regardless of which project contains the cohort or dashboard.
All dataset copies are restricted: When every copy of your dataset exists in projects with restricted download policies, downloads are blocked. However, if at least one copy exists in a project that allows downloads, then downloads are permitted.
Cohort or dashboard restrictions apply: If the specific cohort or dashboard you're viewing has restricted download permissions, downloads are blocked regardless of the underlying dataset permissions.

Combining Cohorts

You can create complex cohorts by combining existing cohorts from the same dataset.

Near the cohort name, click + > Combine Cohorts.
In the Cohorts panel, click Combine Cohorts.

You can also create a combined cohort basing on the cohorts already being compared.

The Cohort Browser supports the following combination logic:

Logic

Description

Number of Cohorts Supported

Intersection

Select members that are present in ALL selected cohorts. Example: intersection of cohort A, B and C would be A ∩ B ∩ C.

Up to 5 cohorts

Union

Select members that are present in ANY of the selected cohorts. Example: union of cohort A, B and C would be A ∪ B ∪ C.

Up to 5 cohorts

Subtraction

Select members that are present only in the first selected cohort and not in the second. Example: Subtraction of cohort A, B would be A - B.

2 cohorts

Unique

Select members that appear in exactly one of the selected cohorts. Example: Unique of cohort A, B would be (A - B) ∪ (B - A).

2 cohorts

Once a combined cohort is created, you can inspect the combination logic and its original cohorts in the cohort filters section.

Cohorts already combined cannot be combined a second time.

Comparing Cohorts

You can compare two cohorts from the same dataset by adding both cohorts into the Cohort Browser.

To compare cohorts, click + next to the cohort name. You can create a new cohort, duplicate the current cohort, or load a previously saved cohort.

When comparing cohorts:

All visualizations are converted to show data from both cohorts.
You can continue to edit both cohorts and visualize the results dynamically.

You can compare a cohort with its complement in the dataset by selecting Compare / Combine Cohorts > Not In …. Similar to combining cohorts, you first need to save your current cohort before creating its not-in counterpart.

Logic

Description

Not In

Select patients that are present in the dataset, but not in the current cohort. Example: In dataset U, the result of "Not In" A would be U - A.

Cohorts created using Not In cannot be used for further creation of combined or not-in cohorts. "Not In" cohorts are linked to the cohort they are originally based on. Once a not-in cohort is created, further changes to the original cohort definition are not reflected.

Creating Cohorts via CLI

The dx command create_cohort generates a new Cohort object on the platform using an existing Dataset or Cohort object, and a list of primary IDs. The filters are applied to the global primary key of the dataset/cohort object.

When the input is a CohortBrowser typed record, the existing filters are preserved and the output record has additional filters on the global primary key. The filters are combined in a way such that the resulting record is an intersection of the IDs present in the original input and the IDs passed through CLI.

For additional details, see the create_cohort command reference and example notebooks in the public GitHub repository, DNAnexus/OpenBio.

Last updated 8 days ago

Was this helpful?