Links

Grouped Box Plot

Learn to build and use grouped box plots in the Cohort Browser.
The Cohort Browser is accessible to all users of the UK Biobank Research Analysis Platform and the Our Future Health Trusted Research Environment.
For DNAnexus Platform users, an Apollo license is required to access the Cohort Browser. Contact DNAnexus Sales for more information.

When to Use Grouped Box Plots

Grouped box plots can be used to compare the distribution of values in a field containing numerical data, across different groups in a cohort. In a grouped box plot, each such group is defined by its members sharing the same value in another field that contains categorical data.
When creating a grouped box plot, note that:
  • The primary field must contain categorical or categorical multiple data
  • The primary field must contain no more than 15 distinct category values
  • The secondary field must contain numerical data
Supported Data Types
Primary Field
Secondary Field
Categorical or Categorical Multiple (<=15 categories)
Numerical (Integer) or Numerical (Float)

Using Grouped Box Plots in the Cohort Browser

The grouped box plot below shows a cohort that has been broken down into groups, according to the value in a field Doctor. For each group, a box plot provides detail on the reported Visit Feeling, for cohort members who share a doctor:
Grouped Box Plot

Non-Numeric Data in Grouped Box Plots

In some cases, a field containing numeric data may also contain some non-numeric values. These values cannot be represented in a grouped box plot. See the chart just above for an example of the informational message that will show below the chart, in this scenario.
Clicking the "non-numeric values" link will display detail on those values, and the number of record in which each appears:
Grouped Box Plot: Detail on Non-Numeric Values

Outliers

Cohort Browser grouped box plots represent all non-null numeric values. When a field contains an outlier value or values - that is, values that are unusually high or low - this can result in a grouped box plot that looks like this:
Outlier Value in a Grouped Box Plot
This grouped box plot displays data on the number of cups of coffee consumed per day, by members of different groups in a particular cohort, with groups defined by shared value in a field Coffee type. Note that in several groups, one member was recorded as consuming far more cups of coffee per day than others in the group.

Grouped Box Plots in Cohort Compare

In Cohort Compare mode, a grouped box plot can be used to compare the distribution of values in a field that's common to both cohorts, across groups defined using values in a categorical field that is also common to both cohorts.
In this scenario, a separate, color-coded box plot is displayed for each group in each cohort.
Hovering over one of these box plots opens an informational window showing detail on the distribution of values for the group in question.
Clicking the "ˇ" icon, in the lower right corner of the tile containing the chart, opens a tooltip showing the cohort names and the colors used to represent data in each.
Grouped Box Plot in cohort compare mode

Preparing Data for Visualization in Grouped Box Plots

When ingesting data using Data Model Loader, note that the following data types can be visualized in grouped box plots:

Primary Field

  • String Categorical
  • String Categorical Multi-Select
  • String Categorical Sparse
  • Integer Categorical
  • Integer Categorical Multi-Select

Secondary Field

  • Integer
  • Integer Sparse
  • Float
  • Float Sparse