Grouped Box Plot

Learn to build and use grouped box plots in the Cohort Browser.

The Cohort Browser is accessible to all users of the UK Biobank Research Analysis Platform and the Our Future Health Trusted Research Environment.

For DNAnexus Platform users, an Apollo license is required to access the Cohort Browser. Contact DNAnexus Sales for more information.

When to Use Grouped Box Plots

Grouped box plots can be used to compare the distribution of values in a field containing numerical data, across different groups in a cohort. In a grouped box plot, each such group is defined by its members sharing the same value in another field that contains categorical data.

When creating a grouped box plot:

The primary field must contain categorical or categorical multiple data
The primary field must contain no more than 15 distinct category values
The secondary field must contain numerical data

Supported Data Types

Primary Field

Secondary Field

Categorical or Categorical Multiple (<=15 categories)

Numerical (Integer) or Numerical (Float)

Using Grouped Box Plots in the Cohort Browser

The grouped box plot below shows a cohort that has been broken down into groups, according to the value in a field Doctor. For each group, a box plot provides detail on the reported Visit Feeling, for cohort members who share a doctor:

Non-Numeric Data in Grouped Box Plots

A field containing numeric data may also contain some non-numeric values. These values cannot be represented in a grouped box plot. See the chart just above for an example of the informational message that will show below the chart, in this scenario.

Clicking the "non-numeric values" link will display detail on those values, and the number of record in which each appears:

Outliers

Cohort Browser grouped box plots represent all non-null numeric values. When a field contains an outlier value or values - that is, values that are unusually high or low - this can result in a grouped box plot that looks like this:

This grouped box plot displays data on the number of cups of coffee consumed per day, by members of different groups in a particular cohort, with groups defined by shared value in a field Coffee type. In multiple groups, one member was recorded as consuming far more cups of coffee per day than others in the group.

Grouped Box Plots in Cohort Compare

In Cohort Compare mode, a grouped box plot can be used to compare the distribution of values in a field that's common to both cohorts, across groups defined using values in a categorical field that is also common to both cohorts.

In this scenario, a separate, color-coded box plot is displayed for each group in each cohort.

Hovering over one of these box plots opens an informational window showing detail on the distribution of values for the group.

Clicking the "ˇ" icon, in the lower right corner of the tile containing the chart, opens a tooltip showing the cohort names and the colors used to represent data in each.

Preparing Data for Visualization in Grouped Box Plots

When ingesting data using Data Model Loader, the following data types can be visualized in grouped box plots:

Primary Field

String Categorical
String Categorical Multi-Select
String Categorical Sparse
Integer Categorical
Integer Categorical Multi-Select

Secondary Field

Integer
Integer Sparse
Float
Float Sparse

Last updated 26 days ago

Was this helpful?